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EDITORIAL 


Systemic equity in education 


oo often in higher education, the legacy of laws, 
policies, and practices that have systematically 
denied educational opportunities to Blacks is ig- 
nored, thereby perpetuating racial inequities. In 
the United States, higher education is a key route 
to career success and upward socioeconomic 
mobility. Unfortunately, this path is increasingly 
becoming most accessible to privileged communities. 
As the new president of Olin College of Engineering in 
Massachusetts, and as a woman of color, I am in a posi- 
tion to help unburden higher education from systemic 
racism and promote positive change that extends be- 
yond academic boundaries. 

My parents instilled in me the importance of educa- 
tion for personal and familial uplifting as well as a means 
of helping other Black Americans to 
achieve success. They reminded me 
that all people are created equal and 
have inalienable rights—a right to edu- 
cation among them. At a young age, I 
realized why they tried to enforce this 
notion. I vividly recall that as a third 
grader in 1963, I had to walk past 
a newly built all-white school to be 
picked up and bused to a dilapidated 
all-Black school in another part of 
Panama City, Florida. I wondered what 
it was like inside. Surely the pristine 
brick exterior and the well-appointed 
playground were indicators that, 
within those walls, white students had 
new and current textbooks, unlike the 
worn and outdated ones in my Black 
school. I wondered what justification 
there was for denying Blacks the same educational expe- 
riences as those afforded to whites. On the bus, I saw the 
stark contrast as we traveled from an integrated to a seg- 
regated neighborhood. As we turned down the dirt road 
leading to the Black school, I remember a sense of moving 
between two very different worlds. 

Separate worlds indeed, but not equal. The U.S. Su- 
preme Court ruling on Plessy v. Ferguson in 1896 le- 
galized “separate but equal” educational institutions 
and opportunities for Blacks. Even though the land- 
mark decision of Brown v. Board of Education in 1954 
declared “separate but equal” to be unconstitutional, 
many schools remained segregated, including the one 
in Florida near where my military family lived nearly 
10 years later. In higher education, historically Black 
colleges and universities (HBCUs) were established in 
the United States in the early 19th century for Blacks 


“It’s time 
to abandon the 
myth that 


students and 
faculty 
of color can’t 
be found.” 


to obtain advanced degrees. Until Brown, most college- 
educated Blacks graduated from HBCUs. 

I eventually became the first Black student to get a 
doctorate in chemical engineering from Rice Univer- 
sity; the fifth woman in the nation to obtain that de- 
gree; and the first Black woman in the country to hold 
a tenure-track position in chemical engineering. But it 
is discouraging that the challenges that existed along 
my journey remain challenges faced today by Black stu- 
dents interested in pursuing careers in science, technol- 
ogy, engineering, and mathematics. There is still a lack 
of diversity among faculty and students in engineering 
schools. This environment has negative consequences 
and feeds a vicious cycle. The dearth of Black faculty role 
models and mentors contributes to the underrepresenta- 
tion of Black students. Structural and 
social barriers such as hostile climates, 
bias, and tokenism make it difficult 
to achieve a sense of belonging and 
limit career choices and opportunities 
for Black students and faculty, further 
perpetuating the persistent underrep- 
resentation. Today, 3.9% of students in 
the United States who graduate with 
a bachelor’s degree in engineering are 
Black. And only 4.1% of students who 
graduate with a Ph.D. in engineering 
in the nation are Black. 

Dismantling systemic racism in 
higher education will require efforts 
to think and operate in new ways be- 
yond existing programs that support 
students of color—those efforts are 
typically targeted to individuals, and 
what’s needed in addition are efforts that promote insti- 
tutional change. Engineering colleges are a good place 
for breaking things down and rebuilding. Olin, for ex- 
ample, is committed to applying a co-creation model 
of change (where students, faculty, and administration 
work together) that relies on a combination of leader- 
ship, shared responsibility and accountability, coura- 
geous and effective discourse, mutual understanding, 
community engagement, and design approaches that 
have the potential for meaningful change. The lessons 
learned in our process of experimentation and discov- 
ery hopefully can be shared to help other colleges inter- 
ested in achieving similar goals. 

It’s time to abandon the myth that students and fac- 
ulty of color can’t be found. Higher education must 
challenge the status quo. 

-Gilda A. Barabino 


Gilda A. Barabino 
is the president 

of Olin College 

of Engineering, 
Needham, MA, USA. 
gbarabino@olin.edu 
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ASTRONOMY 


Telescope’s giant sensor snaps most detailed photo ever 


alk about a sharper image: A recently constructed imag- 

ing sensor array (above) that will be used when the Vera 

C. Rubin Observatory in Chile opens in 2021 has cap- 

tured a world-record 3200 megapixels in a single shot. 

It recorded a variety of objects, including a Romanesco 

broccoli, at that resolution, which is detailed enough to 
show a golf ball clearly from 24 kilometers away. The sensor ar- 
ray’s focal plane is more than 60 centimeters wide, much larger 
than the 3.5-centimeter sensors on high-end consumer digital 


WHO endorses COVID-19 drugs 


BIOMEDICINE | Corticosteroids given orally 
or intravenously should be the standard 
therapy for people with “severe and critical” 
COVID-19, the World Health Organization 
(WHO) said in new guidelines issued last 
week—but they should not be given to 
patients with mild cases. In June, a large 
U.K. trial named Recovery first showed 

that the steroid dexamethasone cut deaths 
among ventilated COVID-19 patients by 
35% after 28 days of treatment. That result 
was confirmed by a WHO-sponsored meta- 
analysis published in JAMA on 2 September 
that included Recovery and six other stud- 
ies testing dexamethasone, as well as two 
other corticosteroids—hydrocortisone and 
methylprednisolone. Many countries, includ- 
ing the United States, had already included 
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corticosteroids in their national treatment 
guidelines. But WHO’s recommendations will 
be important as a signal to low- and middle- 
income countries, says Martin Landray, one 
of Recovery’s principal investigators. 


Virus may move through pipes 


PUBLIC HEALTH | COVID-19 virus particles 
drifting through a Chinese apartment build- 
ing’s plumbing may have infected some 
residents, a study has found, raising fears 
of yet another way that the disease could 
spread. The case echoes a 2003 outbreak of 
severe acute respiratory syndrome (SARS) 
that spread through the pipes of a Hong 
Kong apartment building. Such transmis- 
sion is difficult to prove. But scientists 
suspect that aerosolized coronavirus 

may have spread from the bathroom of a 
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cameras, says the SLAC National Accelerator Laboratory, which 
built the array. When the telescope, funded by the U.S. National 
Science Foundation, begins operating next year, it will image 
the entire southern sky every few nights for 10 years, catalogu- 
ing billions of galaxies each time. The surveys will shed light on 
mysterious dark energy and dark matter, which make up most 
of the universe’s mass. With its repeat coverage, the telescope 
will make the equivalent of an astronomical movie in order to 
discover objects that suddenly appear, move, or go bang. 


Guangzhou family of five through a floor 
drain and into the building’s wastewater 
pipes. Two middle-aged couples living in 
apartments above the family later con- 
tracted COVID-19. The study appeared last 
week in Annals of Internal Medicine. 


Africa’s ‘green wall’ rises slowly 


CONSERVATION | Aplan to reforest a 
cross-continental strip of Africa to hold back 
expansion of the Sahara Desert and the 
semi-arid Sahel has made little progress— 
even though the project is halfway toward its 
planned completion date in 2030, a report 
says. Participating countries have planted 
only 4 million hectares of trees and other 
vegetation for the Great Green Wall, well 
short of the 100 million planned to stretch 
7000 kilometers from Senegal to Djibouti, 
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says the report by the Climatekos consulting 
firm, presented on 7 September at a meeting 
of the countries’ ministers. Supporters pre- 
dicted the project would also create jobs and 
capture carbon dioxide. Scientists have said 
creating grasslands may be more effective 
than planting trees to resist desertification, 
The Guardian reported. 


Large gift for materials research 


PHILANTHROPY | Rice University last week 
received a $100 million gift for materi- 

als science. It is the largest to date in that 
discipline recorded in a database of gifts for 
engineering maintained by The Chronicle 
of Philanthropy. The funding will be used 
to pair materials science with artificial 
intelligence to advance the design and man- 
ufacturing of new materials, for applications 
that include sustainable water systems, 
energy, and telecommunications. The donor 
was the Robert A. Welch Foundation, which 
supports chemistry research in Texas. 


EU bans lead ammo in wetlands 


CONSERVATION | Scientists hailed a move 
last week by the European Union to ban 
the use of lead ammunition near wetlands 
and waterways. The European Chemicals 
Agency has estimated that as many as 

1.5 million aquatic birds die annually 
from lead poisoning because they swal- 
low some of the 5000 tons of lead shot 
that land in European wetlands each year. 
Its persistence in the environment is also 
considered a human health hazard. The 
EU Registration, Evaluation, Authorisation 
and Restriction of Chemicals (REACH) 
committee approved the ban after years 
of controversy. The German delegation, 
which had abstained in a July vote on the 


One-third of white-tailed eagles whose deaths were recorded in 
Germany were poisoned by lead shot, scientists found. 
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FEATURED INTERVIEW 


A U.S. vaccine leader’s vow: Politics stays out 


“| would immediately resign if there is undue interference in this process.” So said Moncef 
Slaoui, scientific director of Operation Warp Speed, the U.S. effort to quickly develop a 
vaccine for COVID-19, in an interview with Science. 


To date, Warp Speed has invested more than $10 billion in eight vaccine candidates. Three are 
now in large-scale efficacy trials, and interim reviews of their data by independent safety and 
monitoring boards could reveal evidence of protection as early as October. 


Slaoui, an immunologist who formerly headed vaccine development at GlaxoSmithKline, 
answered questions from Science last week about how Warp Speed operates and addressed 
concerns that political pressure before the 3 November U.S. presidential election may lead to 
an emergency use authorization of a COVID-19 vaccine before it is proven safe and effective. 
(On 8 September, nine companies developing vaccines for the pandemic coronavirus pledged 


not to seek a premature authorization.) 


“It needs to be absolutely shielded from the politics,” Slaoui says. “Trust me, there will be no 
[authorization request] filed if it’s not right. ... The science is what is going to guide us. ... And 
at the end of the day, the facts and the data will be made available to everyone who wants to 


look at them and will be transparent.” 


Slaoui defended Warp Speed's decision to not consider vaccines made of whole, 
inactivated viruses, a time-tested approach. China has three such vaccines in efficacy 
trials, but he worries they could cause serious side effects in people who receive them. 
Slaoui also said if it had been his choice, the United States would have participated in 
COVAX, amechanism for countries to invest collectively in vaccines and share them; the 
Trump administration declined to join. The full interview—one of Slaoui's most detailed 
since taking the job in May—is at http://scim.ag/SlaouiQA. 


issue, changed its stance to support the 
measure after a letter from 75 scientists 
and petitions signed by more than 50,000 
people called for it to do so. The European 
Commission and the European Parliament 
are expected to formally approve the ban, 
allowing it to go into effect in 2022. REACH 
may debate a complete ban on lead ammu- 
nition and fishing weights later this year. 


Russian dissident poisoned 


CHEMICAL WEAPONS | Alexei Navalny, 

a Russian opposition politician, was 
poisoned with a nerve agent 
“Gdentified unequivocally in tests” 
as a Novichok, an exotic Soviet- 
era chemical weapon, German 
Chancellor Angela Merkel said on 
2 September. Navalny fell ill on 

20 August after drinking a cup 

of tea at a Siberian airport. He 

was flown to Berlin and this week 
emerged from a coma. German mil- 
itary scientists at the Bundeswehr 
Institute of Pharmacology and 
Toxicology in Munich haven’t 
released details of their tests, but 
they had clear targets to hunt for: 
Like other nerve agents, Novichoks 
bind to the enzymes acetylcholin- 
esterase and butyrylcholinesterase, 
creating a telltale conjugate 
compound. Novichok agents came 
to wide public notice in 2018 
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after one was used in an assassination 
attempt against former Russian spy Sergei 
Skripal in the United Kingdom. The attack 
prompted nations to push for a crackdown 
on Novichok agents, and last year they were 
added to the list of toxic chemicals regulated 
under the Chemical Weapons Convention. 


Depression follows lockdowns 


covib-19 | In one of the largest surveys 

of Americans since COVID-19 lock- 

downs began, a majority reported having 
some symptoms of depression, up from 
one-quarter in a prepandemic survey. 

The prevalence of symptoms graded as 
moderate to severe tripled, to 27.8% of 
respondents. A research team compared 
results from two surveys used to screen for 
depression: one administered to more than 
5000 people in 2017 and 2018 by the U.S. 
Centers for Disease Control and Prevention, 
the other given to 1400 people in early 

April by NORC at the University of Chicago. 
Prevalence of depression symptoms rose 

in all demographic groups and espe- 

cially among individuals facing financial 
problems, job loss, or family deaths. The 
increases in self-reported symptoms are 
larger than those recorded in previous 
surveys after large-scale traumatic events 
in other countries, including outbreaks 

of the severe acute respiratory syndrome, 
HINI, and Ebola, the authors write in the 

2 September issue of JAMA Network Open. 
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Why obesity worsens COVID-19 


Even people in the overweight category face higher risk of serious disease 


By Meredith Wadman 


his spring, after days of flulike symp- 

toms and fever, a man arrived at 

the emergency room at the Univer- 

sity of Vermont Medical Center. He 

was young—in his late 30s—and 

adored his wife and small children. 
And he had been healthy, logging endless 
hours running his own small business, ex- 
cept for one thing: He had severe obesity. 
Now, he had tested positive for COVID-19 
and was increasingly short of breath. 

He was admitted directly to the inten- 
sive care unit (ICU) and was on a ventilator 
within hours. Two weeks later, he died. 

“He was a young, healthy, hardwork- 
ing guy,” recalls MaryEllen Antkowiak, 
a pulmonary critical care physician who 
is medical director of the hospital’s ICU. 
“His major risk factor for getting this sick 
was obesity.” 

Since the pandemic began, dozens of 
studies have reported that many of the 
sickest COVID-19 patients have been peo- 
ple with obesity. In recent weeks, that link 
has come into sharper focus as large new 
population studies have cemented the asso- 
ciation and demonstrated that even people 
who are merely overweight are at higher 
risk. For example, in the first metaanalysis 
of its kind, published on 26 August in Obesity 
Reviews, an international team of research- 
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ers pooled data from scores of peer-reviewed 
papers capturing 399,000 patients. They 
found that people with obesity who con- 
tracted SARS-CoV-2 were 113% more likely 
than people of healthy weight to land in the 
hospital, 74% more likely to be admitted to 
an ICU, and 48% more likely to die. 

A constellation of physiological and so- 
cial factors drives those grim numbers. The 
biology of obesity includes impaired immu- 
nity, chronic inflammation, and blood that’s 
prone to clot, all of which can worsen CO- 
VID-19. And because obesity is so 


Many very sick COVID-19 patients, like some 
in this Brazilian intensive care unit, have obesity. 


blood pressure may be high. A recent study 
from Tulane University of 287 hospitalized 
COVID-19 patients found that metabolic syn- 
drome itself substantially increased the risks 
of ICU admission, ventilation, and death. 
But on its own, “BMI [body mass index] 
remains a strong independent risk factor” 
for severe COVID-19, according to several 
studies that adjusted for age, sex, social class, 
diabetes, and heart conditions, says Naveed 
Sattar, an expert in cardiometabolic disease 
at the University of Glasgow. “And it seems to 
be a linear line, straight up.” 


stigmatized, people with obesigp The impact extends to the 32% 

may avoid medical care. Science's of people in the United States who 

“We didn’t understand early on COVID-19 are overweight. The largest de- 
reporting is 


what a major risk factor obesity 
was. ... It’s not until more recently 


supported by the 
Pulitzer Center 


scriptive study yet of hospitalized 
U.S. COVID-19 patients, posted as 


that we've realized the devastating and the a preprint last month by Genen- 
impact of obesity, particularly in Heising-Simons tech researchers, found that 77% 
younger people,” says Anne Dixon, Foundation. of nearly 17,000 patients hospital- 


a physician-scientist who studies 
obesity and lung disease at the University of 
Vermont. That “may be one reason for the 
devastating impact of COVID-19 in the United 
States, where 40% of adults are obese.” 
People with obesity are more likely than 
normal-weight people to have other diseases 
that are independent risk factors for severe 
COVID-19, including heart disease, lung dis- 
ease, and diabetes. They are also prone to 
metabolic syndrome, in which blood sugar 
levels, fat levels, or both are unhealthy and 
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ized with COVID-19 were over- 
weight (29%) or obese (48%). (The Centers 
for Disease Control and Prevention defines 
overweight as having a BMI of 25 to 29.9 ki- 
lograms per square meter, and obesity as a 
BMI of 30 or greater.) 

Another study captured the rate of 
COVID-19 hospitalizations among more 
than 334,000 people in England. Pub- 
lished last month in the Proceedings of 
the National Academy of Sciences, it found 
that although the rate peaked in people 
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with a BMI of 35 or greater, it began to 
rise aS soon aS someone tipped into the 
overweight category (see graphic, below). 
“Many people don’t realize they creep into 
that overweight category,’ says first author 
Mark Hamer, an exercise physiologist at 
University College London. 

The physical pathologies that render 
people with obesity vulnerable to severe 
COVID-19 begin with mechanics: Fat in 
the abdomen pushes up on the diaphragm, 
causing that large muscle, which lies below 
the chest cavity, to impinge on the lungs 
and restrict airflow. This reduced lung 
volume leads to collapse of airways in the 
lower lobes of the lungs, where more blood 
arrives for oxygenation than in the upper 
lobes. “If you are already starting [with] 
this mismatch, you are going to get worse 
faster” from COVID-19, Dixon says. 

Other issues compound these mechanical 
problems. For starters, the blood of people 
with obesity has an increased tendency to 
clot—an especially grave risk during an in- 
fection that, when severe, independently 
peppers the small vessels of the lungs with 
clots (Science, 5 June, p. 1039). In healthy 
people, “the endothelial cells that line the 
blood vessels are normally saying to the sur- 
rounding blood: ‘Don’t clot;” says Beverley 
Hunt, a physician-scientist who’s an expert 
in blood clotting at Guy’s and St. Thomas’ 
hospitals in London. But “we think that sig- 
naling is being changed by COVID,” Hunt 
says, because the virus injures endothelial 
cells, which respond to the insult by activat- 
ing the coagulation system. 

Add obesity to the mix, and the clotting 
risk shoots up. In COVID-19 patients with 
obesity, Hunt says, “You’ve got such sticky 
blood, oh my—the stickiest blood I have 
ever seen in all my years of practice.” 

Immunity also weakens in people with 
obesity, in part because fat cells infiltrate the 
organs where immune cells are produced 
and stored, such as the spleen, bone mar- 
row, and thymus, says Catherine Andersen, a 
nutritional scientist at Fairfield University. 
“We are losing immune tissue in exchange 
for adipose tissue, making the immune sys- 
tem less effective in either protecting the 
body from pathogens or responding to a 
vaccine,” she says. 

The problem is not only fewer immune 
cells, but less effective ones, adds Melinda 
Beck, a co-author of the Obesity Reviews 
metaanalysis who studies obesity and im- 
munity at the University of North Carolina, 
Chapel Hill. Beck’s studies of how obese 
mice respond to the influenza virus demon- 
strated that key immune cells called T cells 
“don’t function as well in the obese state,” 
she says. They make fewer molecules that 
help destroy virus-infected cells, and the 
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corps of “memory” T-cells left behind after 
an infection, which is key to neutralizing 
future attacks by the same virus, is smaller 
than in healthy weight mice. 

Beck’s work suggests the same thing 
happens in people: She found that people 
with obesity vaccinated against flu had 
twice the risk of catching it as vaccinated, 
healthy weight people. That means trials 
of vaccines for SARS-CoV-2 need to include 
people with obesity, she says, because 
“coronavirus vaccines may be less effective 
in those people.” 

Beyond an impaired response to infec- 
tions, people with obesity also suffer from 
chronic, low-grade inflammation. Fat cells 
secrete several inflammation-triggering 
chemical messengers called cytokines, and 
more come from immune cells called mac- 
rophages that sweep in to clean up dead 
and dying fat cells. Those effects may com- 
pound the runaway cytokine activity that 
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characterizes severe COVID-19. “You end up 
causing a lot of tissue damage, recruiting 
too many immune cells, destroying healthy 
bystander cells,” says IIhem Messaoudi, an 
immunologist who studies host responses 
to viral infection at the University of Cali- 
fornia, Irvine. Of the added risk from obe- 
sity, she adds: “I would say a lot of it is 
immune-mediated.” 

The severity of COVID-19 in people 
with obesity helps explain the pandemic’s 
disproportionate toll in some groups. In 
American Indians and Alaska Natives, for 
example, poverty, lack of access to healthy 
food, lack of health insurance, and poor 
exercise opportunities combine to render 
“rates of obesity ... remarkably high,” says 
Spero Manson, a Pembina Chippewa who 
is a medical anthropologist at the Univer- 
sity of Colorado’s School of Public Health. 
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And obesity “is connected to all these other 
[illnesses], such as diabetes and cardiovas- 
cular disease, rendering us susceptible” to 
severe COVID-19, Manson says. 

In addition, a large body of literature 
shows that people with obesity may delay 
seeking medical care due to fear of being 
stigmatized, increasing their likelihood 
of severe disease or death. “Patients that 
experience weight stigma are less likely 
to seek care and less likely to seek follow 
up because they don’t feel welcome in the 
health care environment,’ says Fatima Cody 
Stanford, an obesity medicine physician- 
scientist at Harvard Medical School and 
Massachusetts General Hospital. 

COVID-19-specific research on this ques- 
tion is urgently needed, she adds. “We don’t 
know how many people are dying in the 
community that are never making it in,’ 
Stanford says. “Maybe that was [due] to 
their weight or to their race, the two most 
prevalent forms of stigma in the U.S.” 

For people with obesity, the extra risk adds 
a psychological burden, says Patty Nece, vice 
chair of the Obesity Action Coalition. “My 
anxiety is just totally ramped up,’ she says, 
adding that because of stress eating she’s re- 
cently regained 30 of the 100 pounds she lost 
before the pandemic. “You have the general 
anxiety of this pandemic ... and then you 
layer on top of it: ‘You in particular, you 
could get really sick,” 

Data on how to treat COVID-19 patients 
with obesity are scant. Published evidence 
supports giving such patients higher doses 
of anticoagulants, says Scott Kahan, an obe- 
sity medicine physician who directs the Na- 
tional Center for Weight and Wellness. But 
very little is known about whether and how 
to adjust other treatments such as remde- 
sivir and dexamethasone, partly because 
patients with obesity “are often excluded 
from clinical trials,” he says. He urges that 
COVID-19 treatment trials include people 
with high BMIs wherever possible. 

People with obesity should take extra 
care to avoid getting sick, Messaoudi says. 
“If you are a person with obesity, be extra, 
extra cautious,” she says. “Wear your mask. 
Wash your hands. Avoid large gatherings.” 

In addition, exercising and, separately, 
losing even a little weight can improve the 
metabolic health of a person with obesity, 
and, in doing so, reduce their chances of 
developing severe COVID-19 if they be- 
come infected, says Stephen O’Rahilly, a 
physician-scientist who directs the MRC 
Metabolic Diseases Unit at the University 
of Cambridge. “If you’re 300 pounds, even 
losing a modest amount is likely to have a 
disproportionate benefit on how well you 
do with coronavirus infection. You don’t 
have to become a slim Jim to benefit.” 
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VOICES OF THE PANDEMIC 


Speaking science to power 


A young disease modeling expert has found her voice during the pandemic 


By Kelly Servick 


n May, epidemiologist Caitlin Rivers 

made a rare outing amid coronavirus 

stay-at-home orders. She had been called 

for the first time in her career to testify 

before Congress—and she was intimi- 

dated. “You're looking at the dais and 
seeing all these eminent people. It’s a really 
powerful experience,” she says. 

Then, questions about the U.S. response 
to COVID-19 started to fly, and Rivers was in 
her element. Five years out of gradu- 
ate school, she is already well-versed 
in talking to policymakers about the 
science of pandemics. She has devel- 
oped models to predict the spread of 
Middle East respiratory syndrome 
and Ebola, briefed the Department of 
Defense (DOD) on outbreak response, 
and tracked respiratory disease among 
Army service members. She’s now at 
the Johns Hopkins Center for Health 
Security, a think tank that advises U.S. 
and international leaders on epidem- 
ics and disasters. 

In formal reports, private conversa- 
tions with congressional staffers and 
local officials, and a growing presence 
on Twitter and in the popular press, 
Rivers has emerged as a clear-eyed, 
tactful narrator of the unfolding pan- 
demic. “One of my goals,” she says, “is 
keeping the energy—the intention— 
around the bigger question, ‘Are we 
headed in the right direction?” 

Rivers got interested in epidemio- 
logy as an undergraduate at the Uni- 
versity of New Hampshire, inspired 
in part by Tracy Kidder’s book Moun- 
tains Beyond Mountains: The Quest of 
Dr. Paul Farmer, a Man Who Would 
Cure the World, which describes the 
medical anthropologist’s efforts to 
eradicate disease in developing coun- 
tries. Rivers admired “the respect that he 
brought to the populations that he was work- 
ing with,” she says, “and just the vision—he 
was not about to let anything stop him.” 

Rivers majored in anthropology, and she 
brings an “anthropologist’s understanding 
of how what seem to be totally different cul- 
tures can communicate with each other—the 
policy world and the modeling epidemio- 
logists,” says Stephen Eubank, an epidemio- 
logical modeler at the University of Virginia 
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(UVA) who mentored Rivers during her grad- 
uate training in epidemiology and infectious 
disease at the Virginia Polytechnic Institute 
and State University (Virginia Tech). 

Her Ph.D. coincided with the outbreak of 
Ebola in West Africa, and in the lab of Vir- 
ginia Tech epidemiologist Bryan Lewis, she 
helped prepare weekly updates for experts at 
DOD. “Caitlin would often be emailing me at 
like three in the morning: ‘I updated this to 
get this little thing in! You can put this on 
slide 12!” Lewis, now also at UVA, recalls. 


“We must not become numb. Those numbers 
represent ... people who were loved.” 


Caitlin Rivers, Johns Hopkins Center for Health Security 


The demands of an epidemic are “well-suited 
to my personality,” Rivers says. “I don’t mind 
working hard, and I like having a purpose.” 
As she sat before an appropriations sub- 
committee in the House of Representatives 
in May, the country had made progress. 
Stay-at-home orders were starting to bring 
down new COVID-19 cases. But the nation 
was on the verge of widespread reopening 
that would put hard-won gains at risk. “We 
are in a critical moment of this fight,’ she 
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told the representatives, warning that a 
clear national plan for testing, contact trac- 
ing, and strengthening health care systems 
was essential to prevent tens of thousands 
more deaths. 

As early as March, Rivers, former Food 
and Drug Administration Commissioner 
Scott Gottlieb, and colleagues at the 
American Enterprise Institute had laid out 
criteria for safely reopening businesses, in- 
cluding waiting for a sustained reduction 
in cases. In her May congressional testi- 
mony, she urged the federal govern- 
ment to develop a national plan to 
eliminate test shortages and antici- 
pate bottlenecks in the supply of re- 
agents and materials. 

Things might have gone differ- 
ently if more people in positions of 
power had taken Rivers’s advice. Four 
months later, the United States still 
logs tens of thousands of new cases 
per day and accounts for about one- 
fifth of the COVID-19 deaths docu- 
mented worldwide. 

“Things did not unfold as I would 
have liked them to, certainly,’ 
Rivers says of the U.S. reopening. “Pol- 
itics can get so frustrating because it 
feels—not necessarily as an adviser, 
but as a citizen—like, ‘Why can’t you 
see it the way that I see it?” But, she 
adds, she’s sympathetic to the pres- 
sures that local decision-makers felt 
to restore their economies. 

Laying blame and stirring contro- 
versy isn’t productive for someone 
eager to influence policy, Eubank says, 
citing National Institute of Allergy and 
Infectious Diseases Director Anthony 
Fauci’s aversion to publicly discuss- 
ing his relationship with the Trump 
administration. Of course, Eubank 
adds, Fauci has decades of experience 
threading this needle. But Rivers un- 
derstands it too, and is holding her own just 
a few years out of grad school. 

“As a junior faculty, we don’t have any- 
one helping. We don’t have staff,’ says 
Natalie Dean, a biostatistician at the Uni- 
versity of Florida who has co-authored 
editorials with Rivers on how to interpret 
antibody studies and the need for more de- 
tailed, transparent epidemiological data. “I 
think we’re both adjusting to just having so 
many more people ask things of us.” 
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It’s not just politicians who are turning to 
Rivers for clarity on the pandemic. On Twit- 
ter, which she previously used mostly to dis- 
cuss new results with colleagues, she’s made 
an art of giving a big-picture, 280-character 
view to her followers, who now number 
more than 140,000. 

“Early in an outbreak, we often find only 
the most severe cases,” she tweeted in Feb- 
ruary. “It seems like people are quite sick, 
which is scary. It’s something of an illusion.” 

As some regions turned a corner in April, 
she predicted “growing agitation about 
whether staying home was necessary. Make 
no mistake, it is and was.” 

“We must not become numb,” she urged 
in July as the United States passed 150,000 
deaths. “Those numbers represent people, 
people who were loved.” 

Readers gravitate to these level-headed 
summaries even when the news is bad, 
says Dean, who describes Rivers as her 
“pandemic pal.” Their friendship was born 
on Twitter, she says, where they connected 
over the struggle of caring for young chil- 
dren while working from home. (Rivers has 
19-month-old twins and a 6-year-old.) 

Rivers admits the demands of the pan- 
demic have been “a lot to manage,” but she 
also sees opportunities, including the chance 
to revive a proposal that would better pre- 
pare the country for the next viral threat. 
While she was in graduate school, Rivers 
and colleagues proposed creating a National 
Infectious Disease Forecasting Center, akin 
to the National Weather Service, that would 
put a coordinated team of epidemic model- 
ing experts inside the government. 

Currently, academic experts largely vol- 
unteer their time. “There is no other capa- 
bility of national strategic importance that 
we handle like that,” she says. “We don’t let 
the military self-organize. We don’t let the 
national hurricane center be academics in 
various universities who volunteer.” 

In 2015, the proposal seemed to have a 
chance. Rivers, with colleagues including 
biodefense adviser Dylan George, then at 
the White House Office of Science and Tech- 
nology Policy, discussed the idea at a White 
House meeting on epidemic preparedness. 
But it never advanced to a formal initia- 
tive or a line in the federal budget. “We hit 
the budget cycle at the wrong time,” says 
George, who is now at the national security 
investment firm In-Q-Tel. 

COVID-19 has put new momentum behind 
the effort. Rivers says she has been meeting 
with congressional staff about it, and she is 
hopeful that the past efforts laid the ground- 
work even though they didn’t pay off in time 
to help with COVID-19. She wishes the ini- 
tiative had been launched in 2015, she says, 
“but the second best time is now.” 
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Narrow path charted for editing 
genes of human embryos 


Panel outlines most justifiable uses if safety is ensured 


By Jon Cohen 


hen He Jiankui announced the 

creation of the first gene-edited 

babies in 2018, the work was 

widely seen as dangerous, un- 

ethical, and premature. Two years 

later, an international committee 
has concluded that the dangers remain too 
great for anyone to follow in He’s footsteps. 
But its report, released last week, also lays 
out rare circumstances that might justify 
“heritable human genome editing” (HHGE) 
and calls for a global scientific body to help 
countries assess future proposals. 

The committee, organized by the U.K’s 
Royal Society and two branches of the U.S. 
National Academies of Sciences, Engineer- 
ing, and Medicine, reviewed 
the latest on CRISPR and other 
ways to modify DNA and con- 
sulted scientists, physicians, 
ethicists, and patient groups. Its 
report, which Harvard Univer- 
sity genome-editing researcher 
David Liu calls “thoughtful, bal- 
anced, and well-bounded,” em- 
phasizes that making heritable 
genome changes remains too 
risky for now. “There are a lot of 
gaps in our knowledge and fur- 
ther research is needed,” says 
Kay Davies, a geneticist at the University of 
Oxford who co-chaired the commission. 

But Liu is uneasy with the report’s analy- 
sis of when and how embryo editing might 
be implemented. “I continue to struggle to 
imagine plausible situations in which clinical 
germline editing provides a path forward to 
address an unmet medical need.” 

The report largely steers clear of the com- 
plex social and ethical implications of creat- 
ing gene-edited babies. But it does call for an 
international panel of scientists to assess pro- 
posed uses of HHGE, provide regular updates 
about related technologies, and review clini- 
cal outcomes if an edited embryo implanted 
into a mother is born. 

It also categorizes uses of HHGE into a 
hierarchy ranging from potentially justifi- 
able to strictly off limits. The most justifiable 
use, the commission said, would be helping 
those rare couples who, even with in vitro 
fertilization (IVF) and screening of embryos 
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“Itis a ban 
on editing the 
genome of 
the embryo in 
principle.” 
Denis Rebrikov, 


University 


before implantation, have little or no chance 
of having a baby that does not inherit a ge- 
netic condition leading to “severe morbidity 
or premature death.” A couple in which one 
partner is homozygous for the Huntington 
disease mutation is an example; without 
intervention, their children will inherit the 
mutation and develop the fatal disease. 

Genetic diseases that have less serious 
effects and can be corrected or treated in 
other ways, such as deafness, rank lower. 
At the bottom—most taboo in the eyes of 
the panel—is the use of HHGE for genetic 
enhancement, creating children who are 
smarter, better at sports, or resistant to HIV, 
which was the goal of He’s experiments. 

If HHGE is allowed, the panel said, any 
embryo edit should only “specifically change 
one DNA sequence into a spe- 
cific desired sequence” that is 
common in “the relevant popu- 
lation.” This means the simplest, 
most frequently used form of 
CRISPR, which can cripple genes 
but does not fix them, should 
never be used in embryos. The 
panel also noted there may one 
day be a way to avoid the danger 
of unintended “off-target” DNA 
changes. Scientists have pro- 
posed editing the stem cells that 
produce human sperm or eggs 
before any embryo is created. Those gametes 
could then be tested for off-target changes 
before they are used for IVF. 

The report’s criteria for future use of 
HHGE are so stringent that “it is a ban on 
editing the genome of the embryo in princi- 
ple,” says Denis Rebrikov of Pirogov Russian 
National Research Medical University, who 
has pursued a project to correct a deafness 
mutation in embryos of couples who each 
have the aberrant gene. (Rebrikov has not 
moved forward because he is not yet satis- 
fied he can safely edit a human embryo.) 

Fyodor Urnov, a CRISPR researcher at the 
University of California, Berkeley, is glad the 
commission was so restrictive. “The careful 
guidelines laid out in this report show that 
the list of problems that could be addressed 
by such editing is, in fact, quite small,” he 
says. “It is an open secret in the gene-editing 
community that human reproductive editing 
is a solution in search of a problem.” 
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Indigenous Alaskans demand 
a voice in research on warming 


NSF program struggles to bridge scientists and communities 


By Richard Stone 


limate scientist Darcy Peter is Gwich’in 
Athabascan and hails from the Yukon 
River village of Beaver, Alaska, popula- 
tion 25. There, she says, “The concept 
of a grocery store is overwhelming.” 
She laments that climate change 
threatens the village’s subsistence economy. 
“The Yukon’s channels are changing like 
crazy” as its banks erode, and a major source 
of sustenance—king salmon—is dwindling. 

But Peter, who studies greenhouse gases 
and permafrost thaw at the Woodwell Cli- 
mate Research Center in Falmouth, Massa- 
chusetts, is just as dismayed that to many 
colleagues studying Arctic warming, its im- 
pact on Indigenous Alaskans is often “out 
of sight, out of mind’—despite a recently 
launched U.S. National Science Foundation 
(NSF) initiative meant to change that. 

The Navigating the New Arctic (NNA) ini- 
tiative handed out its first round of grants 
totaling $37.5 million in October 2019, dou- 
bling the amount NSF spends on Arctic re- 
search. It aims to improve understanding of 
Arctic change, but also encourages scientists 
to enlist Indigenous communities in the “co- 
production of knowledge” by involving them 
in planning and executing projects. 

“Tm glad NSF went in that direction,” says 
Kaare Erickson, North Slope science liaison 
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for Ukpeagvik Ifupiat Corporation (UIC) in 
Utqiagvik, formerly Barrow. But NNA’s ex- 
ecution, he says, has been flawed. NSF and 
researchers “expected everyone to drop their 
guard and begin working together. They 
didn’t foresee the backlash they'd get.” 

Many NNA projects ignore Indigenous 
Alaskans or include them as an afterthought, 
a coalition asserted in a letter sent to NSF 
this spring. “We continue to lack meaningful 
access and voice in the vast landscape that is 
the ‘research process,” wrote Kawerak, Inc., 
a consortium of 20 tribes in the Bering Strait 
region, and three other organizations repre- 
senting dozens of Indigenous communities. 

NSF is urging outside scientists to take 
such concerns to heart—for starters, by grasp- 
ing the concept of knowledge coproduction. 
“We made a mistake in assuming that scien- 
tists knew what that meant,” says anthropolo- 
gist Colleen Strawhacker, program officer for 
NSF’s Arctic System Science Program. “We 
definitely have a lot of work to do to make 
sure that Arctic sciences is diversified and 
equitable.” In an open letter on 3 August, 
NSF’s Arctic Sciences Section, which funds a 
separate research slate from NNA, called for 
proposals “that will enrich interactions and 
improve collaboration between Arctic resi- 
dents,’ including Indigenous-led projects. 

Few question the need to better under- 
stand the impacts of climate change in 
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Sea ice loss has sped up the shoreline erosion 
threatening the Indigenous Alaskan village of Kivalina. 


Alaska. Spawning salmon are dying from 
heat stress, reducing catches. As winter sea 
ice grows sparser, Indigenous hunters must 
often travel farther over open water, at their 
peril, to reach walruses and seals hauling out. 
Diminished ice also means higher waves and 
storm surges that pummel shoreline villages. 
“Some parts of Alaska have the highest ero- 
sion rates on Earth,” says Thomas Douglas, a 
geochemist with the U.S. Army Cold Regions 
Research and Engineering Laboratory near 
Fairbanks. The assault forced Newtok, a Yu- 
pik village near the Bering Sea, to relocate 
last fall and is threatening others. 

This spring, Douglas came across another 
warning sign: a young moose that died after 
stumbling into a sinkhole that formed as per- 
mafrost thawed. Permafrost loss renders the 
ground suddenly permeable, “like unplug- 
ging the plug in your bathtub. We hear re- 
ports of fishing holes and freshwater sources 
draining overnight,’ says Merritt Turetsky, 
director of the Institute of Arctic and Alpine 
Research at University of Colorado, Boulder. 

As these challenges have unfolded, Indig- 
enous Alaskans have sought to be part of the 
solution. “For many decades,” the coalition 
wrote to NSF, “we have asked to be active 
partners with agencies and academics that 
wish to come onto our lands and waters to 
conduct research.” 

That plea is often ignored, says Lauren 
Divine, director of the Ecosystem Conserva- 
tion Office for the Aleut Community of St. 
Paul, a volcanic island in the Bering Sea. 
St. Paul is a microcosm of the upheaval the 
region is enduring, with heavy coastal ero- 
sion and mass die-offs of puffins and other 
seabirds. Scientists studying these woes 
sometimes seem to view Indigenous partici- 
pation as an exercise in ticking a box, says 
Divine, who is a marine biologist by train- 
ing. “We ended up just getting cold-called. 
Solicitations to hop onto a proposal without 
any thought for what funding would be di- 
rected to the tribe.” 

In their letter to NSF, Indigenous lead- 
ers recommended NNA focus on projects 
that address the sustainability of Arctic 
communities—food security and infrastruc- 
ture, in particular—and set aside 25% of 
NNA funds for Indigenous-led projects. 
“We would love to see more proposals 
coming in on those topics,” Strawhacker 
says. But an agency spokesperson says NSF 
has no plans to reserve funding for Indige- 
nous-led projects. 

Another irritant for Bering Sea communi- 
ties like Divine’s is NSF’s focus on Alaska’s 
North Slope, facing the Arctic Ocean. The 
March letter notes that “the majority of com- 
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munities with the greatest threat to infra- 
structure from permafrost degradation lie 
outside this area.” 

For NSF-funded scientists, a big draw on 
the North Slope is a research center and 
environmental observatory near Utqiagvik 
that UIC runs on an NSF contract. Still, “I 
don’t think lack of research infrastructure 
is the main impediment” keeping outside 
researchers from connecting with Indig- 
enous Alaskans, says anthropologist Julie 
Raymond-Yakoubian, social science program 
director for Kawerak. It’s more that scientists 
have not historically made bridge-building a 
priority, she says. 

There are success stories. One early ex- 
ample Raymond-Yakoubian points to is a 
Kawerak workshop in 2014 on ocean cur- 
rents that brought oceanographers together 
with Indigenous hunters and fishers. “At the 
end, a lightbulb went on for everybody,” she 
says. The oceanographers had a handle on 
deep off-shore currents, while the Indige- 
nous Alaskans had an intimate knowledge of 
near-shore currents: how animals navigate 
them, for instance, and where eddies form, 
trapping choice driftwood for boatmaking. 
“There’s a body of knowledge you develop 
as a community that cannot be replicated by 
Western science,” Raymond-Yakoubian says. 

Pacific walruses also led to a meeting of 
the minds. For thousands of years, many 
Indigenous Alaskan communities have 
hunted walruses for food. But as ice in the 
Bering and Chukchi seas began to dimin- 
ish 2 decades ago, biologists worried about 
the future of the animals, and in 2009, the 
nonprofit Center for Biological Diversity pe- 
titioned the U.S. Fish and Wildlife Service 
(USFWS) to list the species as threatened 
or endangered. At the same, the retreat 
of the ice was making it harder and more 
dangerous for subsistence hunters to reach 
walruses, and in 2013, two St. Lawrence Is- 
land Yupik communities declared harvest 
disasters, says Vera Metcalf, director of the 
Eskimo Walrus Commission. 

To gauge the population’s health, USFWS 
and the commission set up workshops to tap 
Indigenous insights on issues such as where 
walruses calve. “The best walrus expertise ex- 
ists out there in the hunting communities,” 
says USFWS marine mammal expert Joel 
Garlich-Miller, who is based in Alaska. US- 
FWS determined the walrus population ap- 
pears to be relatively large and healthy, and 
in 2017 declined to list the species. 

The cooperation continues. “Things are 
much better now,’ says Metcalf, who is Yupik. 
It’s one case where “Our Indigenous voice is 
being heard.” 


Richard Stone is senior science editor at the Howard 
Hughes Medical Institute's Tangled Bank Studios. 
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Census experts fear rush to 
finish tally will yield flawed data 


Mounting fears of massive U.S. undercount spur push for 
independent oversight, more time to complete 2020 count 


By Jeffrey Mervis 


ith the 2020 census in its final 

month, the U.S. statistical commu- 

nity fears rushed deadlines and po- 

litical interference could lead to a 

seriously flawed head count. They 

want Congress to take two steps 
to avoid that fate: ensure that the Census 
Bureau has enough time to do the job right, 
and create an independent oversight body to 
track the agency’s efforts. 

The primary purpose of the decennial 
census is to determine how many seats each 
state gets in the 435-member House of Repre- 
sentatives. The data are also used to allocate 
some $1.5 trillion per year in federal spend- 
ing, and they fuel countless research studies 
of U.S. demographic trends. 

But many social scientists believe several 
recent actions by the Trump administration 
have undermined the bureau’s ability to meet 
those obligations without sacrificing its rig- 
orous standards for quality. Last month, the 
administration cut by nearly half the time 
the bureau had earlier said it needed for its 
final push to complete the census. Demogra- 
phers fear that could result in a major under- 
count of people who are traditionally hard to 
reach—including immigrants, the poor, and 
people of color—and distort the country’s 
demographic profile. And some observers 
charge that the recent insertion of three po- 
litical appointees into new, high-level Census 
positions is part of a broader effort by the 
White House to produce a 2020 census that 
will benefit Republican-leaning states by giv- 
ing them greater representation in Congress. 

“Forcing the bureau to meet the current 
deadlines will sacrifice the accuracy of the 
census, and waste $16 billion in taxpayer 
dollars,’ says Arturo Vargas, head of the Na- 
tional Association of Latino Elected and Ap- 
pointed Officials (NALEO) Educational Fund, 
one of several groups that have criticized the 
Trump administration’s approach to the 2020 
census. He says the administration’s actions, 
which include a failed last-minute attempt 
to add a citizenship question to the census, 
have also tarnished the agency’s “well-earned 
global reputation as a respected statistical 
agency, independent of political agendas.” 
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The most expensive element of every 
census is tracking down the one-third of 
all U.S. residents who do not respond to re- 
peated reminders to answer the 10 questions 
and submit the form. The bureau begins its 
nonresponse follow-up (NRFU) campaign 
roughly 6 weeks after the official 1 April start 
of the decennial census. But the COVID-19 
pandemic delayed the NRFU and also led the 
bureau to ask Congress for a 4-month exten- 
sion of its 31 December deadline for submit- 
ting the state-by-state numbers used for the 
apportionment of House seats. 

The Trump administration later rescinded 
that request, however, and on 3 August 
Census Bureau Director Steven Dillingham 
announced that the agency would meet its 
end-of-the-year deadline by halting field op- 
erations on 30 September, 4 weeks earlier 
than planned. (Last week a federal judge 
blocked the bureau’s effort to wrap things up 
early pending a 17 September hearing.) 

Last week, the House committee that over- 
sees the Census Bureau released an internal 
agency report warning that the compressed 
period “creates risks for serious errors” and 
that eliminating some operations “will re- 
duce accuracy.’ Census officials have also 
canceled an exercise this month designed 
to ensure enumerators don’t miss so-called 
group quarters—places that are home to 
large numbers of residents, including college 
dormitories, prisons, and nursing homes. 

Such last-minute changes will most likely 
mean greater reliance on a process called im- 
putation to fill in any data gaps. Imputation 
uses information on file with other govern- 
ment agencies to infer the demographic char- 
acteristics of non-respondents. But experts 
say demographic groups with lower self- 
response rates are also less likely to be found 
in existing administrative records, increasing 
the odds they will be undercounted. 

In recent censuses, the nonresponse rate 
has been less than 1%—it was about 0.4% 
in 2010—leaving few holes to fill with im- 
putation. But many experts believe the non- 
response rate could reach double digits in 
2020. “And you can’t impute 15%” without 
seriously jeopardizing the accuracy of the 
overall count, warns Kenneth Prewitt, a for- 
mer Census director. 
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To reduce that number, Prewitt and other 
census advocates want to give the bureau the 
4-month extension it originally requested. 
In May, the Democrat-controlled House 
included the extension in a pandemic re- 
lief package. But that bill has stalled in the 
Republican-led Senate. 

Prewitt and three other former Census 
directors also want Congress to establish 
an independent body to monitor the cen- 
sus in real time. On 29 July, Prewitt told the 
House panel that the group would “assess if 
the final 2020 numbers reasonably match 
what the bureau knows they should be and, 
if not, what steps the country should take.” 
The panel’s chair, Representative Carolyn 
Maloney (D-NY), supports the idea but 
hasn’t settled on a legislative strategy to 
make it happen, according to a committee 
aide. And advocates worry the administra- 
tion might balk at providing the outside 
body with the access it will need to do its job. 

Last month, the bureau took a small 
step toward greater transparency by post- 
ing daily updates on the percentage of 
addresses in each state from which it has 
collected information. But such state-level 
data don’t tell the whole story. Advocates 
for an outside panel say it could dig deeper 
by examining response rates and NRFU 
data from smaller geographic areas, such as 
several city blocks or a portion of a rural 
community. Such analyses might result in 
the panel issuing “a red flag for the possibil- 
ity of a disproportionate undercount,” says 
Robert Groves, another former Census di- 
rector who supports outside oversight. 

The expert body could also examine how 
often field workers have used proxies—inter- 
viewing a neighbor, for example—to obtain 
information. Such proxies are less reliable 


bee Ceiaboradores: 


than self-responses in determining whether 
a particular dwelling is occupied, as well as 
the race, sex, and age of every resident. Other 
helpful indicators of census quality might in- 
clude the share of returned forms that didn’t 
answer one or more questions. 

This summer’s arrival of three political ap- 
pointees holding newly created positions at 
the bureau has also spurred calls for more 
oversight. Social scientists fear that the ap- 
pointees—including Benjamin Overholt, an 
Army veteran with a 2013 Ph.D. in applied 
statistics and research methods as the dep- 
uty director for data—might bring a politi- 
cal agenda to how the bureau completes its 
work and releases the data. 

Census officials declined to make Overholt 
available for an interview, and the bureau has 
not spelled out his duties. But Thomas Louis, 
a former chief scientist at the agency and 
emeritus professor at Johns Hopkins Univer- 
sity, says that “Of all the things the Census 
Bureau doesn’t need at this point, a deputy 
director for data is at the top of my list. ... 
Ensuring data quality is the job of everybody 
involved in the collection, curation, and dis- 
semination of census products.” 

Social scientists also worry that a 21 July 
Trump order requiring the Census Bureau to 
exclude undocumented residents from the 
state-by-state count will damage the overall 
quality of the 2020 census (Science, 7 August, 
p. 611). Civil rights groups have sued to block 
the order, which they say violates a constitu- 
tional requirement to count every resident. 

Given all these unanswered questions, 
some observers are already speculating about 
a possible early do-over. A badly flawed cen- 
sus, Says one census expert who requested 
anonymity, could create “a groundswell for a 
mid-decade census.” 


Residents in ae Texas, are urged to fill out the 2020 census at a walk-in center. 
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GENOMICS 


Massive 
project reveals 
complexity of 
gene regulation 


Data from tissues around 
the body boost search 
for genetic basis of disease 


By Elizabeth Pennisi 


hen the human genome was se- 
quenced almost 20 years ago, 
many researchers were confident 
they’d be able to quickly home in 
on the genes responsible for com- 
plex diseases such as diabetes or 
schizophrenia. But they stalled fast, stymied 
in part by their ignorance of the system of 
switches that govern where and how genes 
are expressed in the body. Such gene regula- 
tion is what makes a heart cell distinct from 
a brain cell, for example, and distinguishes 
tumors from healthy tissue. Now, a mas- 
sive, decadelong effort has begun to fill in 
the picture by linking the activity levels of 
the 20,000 protein-coding human genes, as 
shown by levels of their RNA, to variations in 
millions of stretches of regulatory DNA. 

By looking at up to 54 kinds of tissue in 
hundreds of recently deceased people, the 
$150 million Genotype-Tissue Expression 
(GTEx) project set out to create “one-stop 
shopping for the genetics of gene regula- 
tion,” says GTEx team member Emmanouil 
Dermitzakis, a geneticist at the University of 
Geneva. In a brace of papers in Science (pp. 
1318, 1331-34), Science Advances, Cell, and 
other journals this week, GTEx researchers 
roll out the final big analyses of these free, 
downloadable data, as well as tools for fur- 
ther exploiting the data. 

“This resource is invaluable” for anyone 
interested in particular diseases, or studying 
tissues or cell types, says Jan Korbel, a human 
geneticist at European Molecular Biology 
Laboratory (EMBL), Heidelberg. “It’s a public 
treasure trove,” says Jun Li, a geneticist at the 
University of Michigan, Ann Arbor. 

But the complex main analysis (p. 1318) 
drives home just how convoluted the inter- 
connections between genes and their regu- 
latory DNA can be. The papers “are written 
in bureaucratese,” and the announced re- 
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sults are hard to decipher, says Dan Graur, 
an evolutionary biologist at the University 
of Houston and a well-known critic of big 
science. And like other critics, he notes that 
the project, with 85% white donors, sorely 
lacks diversity and thus will miss genetic 
variation in other groups. 

GTEx can’t yet pin down sequences re- 
sponsible for illnesses such as heart disease 
and kidney failure, or trace how the layers of 
gene regulation work together. “We shouldn’t 
pack up our bags and say gene expression is 
solved,’ says genomicist Ewan Birney, deputy 
director general of EMBL, who led another 
big genomics project called ENCODE. 

After GTEx was launched in 2010, fami- 
lies of more than 900 deceased subjects 
who had already pledged their organs or 
tissues for transplants agreed research- 
ers could also take samples of their loved 
ones’ healthy tissues, for example brain, 
muscle, fat, pancreas, and heart. Having 
multiple tissues from the same subject gave 
researchers confidence that variation in 
gene expression between, say, muscle and 
pancreas, was real and meaningful. “For 
the first time, we have this homogeneous 
set so we could get at biological differ- 
ences between tissues,” says GTEx member 
Barbara Stranger, a geneticist at North- 
western University. 

Researchers described each sample, then 
imaged and froze all the tissues for future 
analysis. They deciphered genomes and 
quantified RNA to measure gene activity. 
In addition to comparing tissues within 
one person, they could also compare the 
same tissue in different individuals. They 
were able to link variations in DNA to gene 
expression levels using statistical analyses 
to find correlated patterns of change. The 
heart of the GTEx database is a compila- 
tion of the complex relationships between 
stretches of regulatory DNA called expres- 
sion quantitative trait loci, or eQTLs, and 
the genes they regulate. 

A pilot phase, completed in 2015, exam- 
ined nine tissues in depth (Science 8 May 
2015, p. 618) and demonstrated that sam- 
ples from corpses were reasonable stand-ins 
for living tissue, says GTEx co-leader Tuuli 
Lappalainen, a human geneticist at the 
New York Genome Center. Now, after ana- 
lyzing almost 20,000 samples, GTEx “has 
reached a size where we can gain much 
clearer, crisper insights,’ says co-leader 
Kristin Ardlie, a human geneticist at the 
Broad Institute. She and her colleagues 
found that almost every human gene is 
regulated by at least one eQTL, many of 
which target multiple genes and presum- 
ably affect multiple traits. 

Stranger uncovered another key result: 
Almost every tissue including, for example, 
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A regulatory universe expands 
Genotype-Tissue Expression (GTEx) donors, tissues, 
and samples have grown with each publication mile- 
stone. The project's results, such as the number of 
genes linked to specific regulatory sequences, have 
blossomed accordingly. 
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skin and heart, showed differences in gene 
expression between males and females 
(p. 1331). “The vast majority of biology is 
shared by males and females,’ Stranger 
says, but the expression differences may 
help explain why men and women have 
different disease patterns or reactions to 
drugs. “I consider that a major finding,” 
Korbel says. 

Likewise, Broad co-leader Francois Aguet 
and colleagues confirmed certain eQTLs ex- 
tend their reach to distant genes, even those 
on other chromosomes. GTEx documented 
143 such “trans” elements, some of which 
affect multiple genes across the genome. 

Kelly Frazer at the University of Califor- 
nia, San Diego, is already using the data to 
help make sense of so-called genome-wide 
association studies (GWAS), which pose 
major mysteries. In a GWAS, massive con- 
sortia look at the genomes of thousands 
of patients with a particular disease or 
trait and note hundreds of subtle genetic 
changes, often outside of genes them- 
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selves. But researchers often have no clue 
which of these many suspects triggers the 
disease or shapes the trait. 

For example, GWAS studies had identi- 
fied more than 500 genetic variations that 
appeared to affect heart rhythm and elec- 
trical conductance. Frazer wanted to know 
how a heart-specific transcription factor 
called NKX2-5 influenced those traits. Her 
team had identified thousands of DNA 
variations that might affect NKX2-5’s ac- 
tivity and so perhaps shift heart rhythm. 

Paola Benaglio in Frazer’s lab analyzed 
and compared those DNA variations, 
GWAS data, and GTEx data in order to 
identify which DNA variations actually reg- 
ulate NKX2-5 activity. She was able to first 
narrow the candidate eQTLs to 55, then to 
nine and finally, using GWAS data on heart 
rhythms and other tools, she zeroed in on 
a single variable base on chromosome 1. 
Next, she blocked that DNA base using the 
genome editor CRISPR and confirmed that 
it alters NKX2-5 binding, Benaglio, Frazer, 
and their colleagues reported last year in 
Nature Genetics. 

“Ym sure there are hundreds of people 
like me” who appreciate the database, 
Frazer says. The statistics back her up. 
Monthly, 16,000 people visit the GTEx por- 
tal, and others examine the data on other 
sites. In 2018, 900 papers cited it. 

Birney understands the enthusiasm, but 
cautions that spurious correlations between 
eQLTs and genes can arise. Homing in on a 
disease-causing variant via GTEx “is not a 
slam dunk.” 

Graur, for his part, remains skeptical that 
gene activity in corpses adequately reflects 
what’s going on in the living, despite the 
team’s data on the preservation of gene ex- 
pression. “It’s like studying the mating be- 
havior of roadkill,” he says. 

As the project winds down, the U.S. Na- 
tional Institutes of Health is planning a de- 
velopmental GTEx that will enroll people 
under age 20 to create an atlas of gene ex- 
pression from birth to adulthood. In such 
follow-up efforts, a more diverse set of tissue 
donors “would be very valuable,’ Korbel says. 
GTEx initially shot for that goal but faltered 
because tissue and organ donors are dis- 
proportionately white. Researchers need to 
“communicate more effectively,’ says Laura 
Siminoff, a social scientist at Temple Univer- 
sity who was funded early on to look at GTEx 
ethics. “Otherwise we will be doing this sci- 
ence for white people.” 

The results so far cannot tell the full story 
of how the genome gives rise to a human 
being’s myriad tissues and diseases. Still, 
Birney predicts, “GTEx will get used and 
reused again and again, and there will be 
some uses I cannot predict.” & 
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AUTOMATED PLANET FINDER, CALIFORNIA 


LISTEN UP 


PARKES RADIO TELESCOPE, AUSTRALIA 


Flush with money and a hard-won respectability, alien hunters 


are deploying new telescopes and tactics 


n 2015, Sofia Sheikh was at loose ends. 
Her adviser at the University of Cali- 
fornia (UC), Berkeley, with whom she 
studied hot, giant exoplanets, had left 
for a new job. Browsing reddit, she 
saw a post about a lavishly funded new 
search for extraterrestrial intelligence 
(SETI and noticed that its leader was 
also at UC Berkeley: astrophysicist 
Andrew Siemion. She asked her former 
adviser for an introduction and met with 
Siemion when he was still unpacking boxes 
in a new office. “Everything’s kind of history 
from there,” says Sheikh, who became the 
team’s first undergraduate student. 

Sheikh is now a Ph.D. student at Penn- 
sylvania State University (Penn State), Uni- 
versity Park, where she led a radio survey of 
20 nearby star systems aligned with Earth’s 
orbital plane. If an intelligent civilization in- 
habited one of these systems and pointed a 
powerful telescope our way, they would see 
Earth passing in front of the Sun, and they 
might detect signs of life in our atmosphere. 
They might even decide to send us a mes- 
sage. The results, published in February in 
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The Astrophysical Journal, were unsurpris- 
ing. “Spoiler alert: no aliens,” Sheikh jokes. 

SETI researchers are used to negative re- 
sults, but they are trying harder than ever 
to turn that record around. Breakthrough 
Listen, the $100 million, 10-year, privately 
funded SETI effort Siemion leads, is lift- 
ing a field that has for decades relied on 
sporadic philanthropic handouts. Prior to 
Breakthrough Listen, SETI was “creeping 
along” with a few dozen hours of telescope 
time a year, Siemion says; now it gets thou- 
sands. It’s like “sitting in a Formula 1 rac- 
ing car,’ he says. The new funds have also 
been “a huge catalyst” for training scien- 
tists in SETI, says Jason Wright, director of 
the Penn State Extraterrestrial Intelligence 
Center, which opened this year. “They really 
are nurturing a community.” 

Breakthrough Listen is bolstering radio 
surveys, which are the mainstay of SETI. But 
the money is also spurring other searches, in 
case aliens opt for other kinds of messages— 
laser flashes, for example—or none at all, 
revealing themselves only through pas- 
sive “technosignatures.” And because the 
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By Daniel Clery 


data gathered by Breakthrough Listen are 
posted in a public archive, astronomers are 
combing through it for nonliving phenom- 
ena: mysterious deep-space pulses called 
fast radio bursts and proposed dark matter 
particles called axions. “There are untapped 
possibilities here,” says axion searcher 
Matthew Lawson of Stockholm University. 
Perhaps the most important conse- 
quence of Breakthrough Listen is that it 
has nudged SETI, once considered fringe 
science, toward the mainstream. “Jour- 
nals are relaxing and letting good techno- 
signature papers be published,” says astro- 
biologist Jacob Haqq-Misra of the Blue Mar- 
ble Space Institute of Science. “The giggle 
factor is reducing.” After nearly 3 decades of 
eschewing SETI, NASA organized a techno- 
signature workshop in 2018. In June, it 
awarded a grant to model the detectability 
of possible technosignatures in the atmo- 
spheres of exoplanets, its first ever SETI- 
related grant not involving radio searches. 
But some astronomers worry the funding 
boon is distorting science. Fernando Camilo, 
chief scientist of the South African Radio 
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Astronomy Observatory, says Breakthrough 
Listen’s voracious appetite for time on large 
telescopes leaves him uncomfortable. “It 
leaves less time to do astronomy.’ Others say 
SETY’s high-risk, rush-for-the-prize approach 
could distract funders from a more rational, 
stepwise search for extraterrestrial life. “We 
do have a really thoughtful process on what 
gets funded and what doesn’t,” says Harvard 
University astronomer David Charbonneau. 
“That doesn’t happen with rich individuals.” 

But SETI proponents don’t see themselves 
as separatists. They are increasingly working 
hand in hand with those searching for exo- 
planets and studying astrobiology. “Looking 
for intelligence is the logical conclusion of 
this search for life,” says astronomer David 
Kipping of Columbia University. 


SETI STARTED SMALL. In 1960, astronomer 
Frank Drake pointed a 26-meter radio tele- 
scope in Green Bank, West Virginia, at two 
nearby Sun-like stars. He scanned frequen- 
cies around 1.42 gigahertz, which correspond 
to wavelengths of about 21 centimeters— 
the part of the spectrum where clouds of 
interstellar hydrogen emit photons. This 
21-centimeter glow is ubiquitous, and Drake 
supposed it might be a universal channel on 
the cosmic dashboard, a natural place for a 
clarion “We are here!” But his targets, Tau 
Ceti and Epsilon Eridani, were expression- 
less. The survey, called Project Ozma, saw 
no sign of artifice, such as an intense spike 
squeezed into a narrow frequency band. 
With funding from NASA and the Na- 
tional Science Foundation (NSF), however, 
searches continued, with bigger telescopes 
to listen for fainter signals and hardware 
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that could scan thousands and eventually 
millions of narrow frequency channels at 
once. Drake devised his now famous, epony- 
mous equation that estimates how many 
communicative extraterrestrial civilizations 
may exist in the Milky Way. It depends on 
seven variables, from the rate of star forma- 
tion to the average lifetime of a civilization. 
Even though only one of the seven factors— 
star-formation rate—was known with any 
certainty, alien hunters were on the prowl. 
In 1992, NASA decided to look harder, 
only to quickly reverse course. It embarked 
on the Microwave Observing Project, a 
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GREEN BANK TELESCOPE, WEST VIRGINIA 


The four main telescopes used by Breakthrough 
Listen are scanning nearby stars and galaxies 
for any radio or laser messages beamed at Earth. 


10-year, $100 million SETI search using 
several large telescopes. But the following 
year, the project was ridiculed and cut by 
lawmakers focused on reducing the fed- 
eral budget deficit. Ever since, NASA has 
mostly shied away from SETI. 

Even as federal funding shriveled, the 
1990s gave SETI an unexpected gift. Until 
then no one had detected an exoplanet, much 
less a potentially hospitable one, but that de- 
cade brought a host of discoveries. Since then, 
missions such as NASA’ Kepler telescope 
have suggested that planetless stars are rare, 
and that about one in five Sun-like stars has 
potentially habitable Earth-size planets—two 
more factors in the Drake equation that have 
fueled optimism among SETI advocates. The 
turn-of-the-century tech boom offered an- 
other boost: newly minted billionaires with 
a taste for space. A high point came in 2007 
with the inauguration of the Allen Telescope 
Array, a SETI observatory in California kick- 
started with $11.5 million from Microsoft co- 
founder Paul Allen. 

Then the field took another plunge. The 
2008 financial crisis struck and within a 
few years, with federal and state funding 
tight, UC Berkeley withdrew from the proj- 
ect. The array was put into hibernation for 
8 months. A planned expansion from 42 to 
350 dishes never materialized. “SETI was 
entirely decimated,’ Siemion says. “I was 
one of maybe two or three in the whole 
world working on SETI.” 

That was when Yuri Milner called. 
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BORN AND EDUCATED in Moscow, Milner 
worked as a particle physicist at the Leb- 
edev Physical Institute. In 1990, as the Soviet 
Union collapsed, he left to study business at 
the University of Pennsylvania, and in 1999 
he founded an internet investment fund. The 
fund was an early backer of Facebook and 
Twitter, and later Spotify and Airbnb. Forbes 
magazine puts Milner’s net worth at $3.8 bil- 
lion. “I made some lucky investments,’ he 
tells Science. 

Milner says he’s always felt a connection 
with space and SETI. He was born in 1961, 
days after Drake convened the first SETI con- 
ference. He is named after Yuri Gagarin, the 
first cosmonaut. Once he had 
built up a fortune, “I discov- 
ered that now I can give back 
to science,’ he says. He knew 
of SETT’s dire financial straits, 
and he believed his money 
and knowledge of the tech 
industry could help speed 
up the search. Siemion’s 
UC Berkeley center, across 
the San Francisco Bay from 
Milner’s home in Silicon Val- 
ley, became the beneficiary. 

Breakthrough Listen set 
out ambitious goals (Science, 
24 July 2015, p. 357). It would survey 1 mil- 
lion of the closest stars to Earth and 100 
nearby galaxies using two of the world’s most 
sensitive steerable telescopes, the 100-meter 
Green Bank Telescope in West Virginia and 
the 64-meter Parkes radio telescope in Aus- 
tralia. Buying up about 20% and 25% of 
the time on those telescopes, Breakthrough 
Listen promised to cover 10 times more sky 
than previous surveys and five times more 
of the radio spectrum, and gather data 
100 times faster. 

Achieving these goals required new hard- 
ware. The key electronic component is a 
digital backend, which chops telescope data 
into ultrathin frequency slices and records it. 
Siemion says Breakthrough Listen’s backends 
are “orders of magnitude more powerful than 
anything else on site” The instruments are 
available for 100 hours every year to other as- 
tronomers interested in such fine frequency 
resolution. That allocation is often oversub- 
scribed at Green Bank, Siemion says, ever 
since the backend helped characterize the 
first repeating fast radio burst. 

The project is adding a major new tele- 
scope to its mix of collaborations: MeerKAT, 
a South African array of 64 dishes each 
13.5 meters across (Science, 22 June 2018, 
p. 1285). Instead of buying time on the array, 
Breakthrough Listen is tapping into the data 
stream while the telescope observes its regu- 
lar targets—a procedure known as commen- 
sal observing. “You take what you can get,” 
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“Just knowing 
we are not alone... 
is something 
that can bring us 
together 
here on Earth.” 


Yuri Milner, 
Breakthrough Listen 


Camilo says. “When it works, it’s fantastic.” 
Commensal observing will also be added to 
the Karl G. Jansky Very Large Array in New 
Mexico, the workhorse of U.S. radio astron- 
omy, in a project led by the privately funded 
SETI Institute. 

Gathering data sets is one thing; scour- 
ing heaps of them for alien messages is an- 
other. SETI researchers have long looked 
for energy packed into narrow frequency 
signals—something that is hard for nature 
to replicate, although astronomers need to 
exclude humanmade signals. One test is to 
see whether the signal’s frequency drifts over 
time: An alien transmitter would be on a moy- 
ing planet, causing a Doppler 
shift. If the frequency is rock 
steady, it’s likely to be earthly 
interference. Similarly, if the 
signal persists when the tele- 
scope moves from its target, 
it’s noise from Earth. 

But aliens might send 
something more complex 
than a single loud note. How 
do you scan SETI data for 
something that just seems 
anomalous or weird? Re- 
searchers have been trying 
to enlist artificial intelligence 
(AI), but it hasn’t been easy. One species of 
AI, natural language algorithms, can recog- 
nize key words in the flow of human speech— 
think of Amazon’s Alexa, or eavesdroppers at 
the National Security Agency—after being 
trained on vast speech data sets. But the 
huge number of narrow frequency channels 
in SETI data overwhelms these algorithms. 

Converting the data stream into 2D dia- 
grams that resemble images works better, 
at least in tests, in which machine vision al- 
gorithms picked out strange pictures from 
a torrent of similar ones. “We have to guess 
what an anomaly might look like and train 
the algorithm to look for this, or look for 
things that look similar,’ says Steve Croft of 
UC Berkeley’s SETI Research Center. 


THE FOCUS OF SETI searches tends to reflect 
the technology of the times. Radio was in its 
heyday when Drake started out. But as lasers 
have grown in power and sophistication, so 
have efforts to spot alien laser signals with 
so-called optical SETI. 

Astronomers have carried out optical 
searches with modest telescopes since the 
1990s. Breakthrough Listen is doing its own, 
with time on the 2.4-meter Automated Planet 
Finder (APF) telescope at the Lick Observa- 
tory in California. APF has been scanning a 
sample of stars to distances up to 160 light- 
years but will now work through a new list: 
stars with potentially habitable planets iden- 
tified by NASA’s Transiting Exoplanet Survey 
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Satellite (Science, 30 March 2018, p. 1453). 

Others are developing telescopes that 
wouldn’t need to target individual stars. 
The LaserSETI project, funded by the SETI 
Institute, is a collection of $30,000 mini- 
observatories, made up of an off-the-shelf 
fisheye lens, two cameras, and electronics 
that would gather light from the entire sky. 
The first was installed last year on an observa- 
tory roof north of San Francisco. Eventually, 
the institute wants to install 60 instruments 
around the world for 24/7 coverage. 

LaserSETI’s small telescopes would only 
pick up an especially bright flash from a 
nearby source. Shelley Wright of UC San 
Diego hopes to see much farther with the 
Pulsed All-sky Near-infrared Optical SETI 
(PANOSETI), an all-sky telescope able to 
detect ultrashort laser pulses across all op- 
tical wavelengths. 

PANOSETI’s design includes lightning-fast 
photon counters sensitive to pulses less than 
one-billionth of 1 second long. “It’s hard for 
nature to make that,’ Shelley Wright says. It 
relies on a Fresnel lens, a type used in light- 
houses to focus light into a narrow beam. 
Flipped over, a Fresnel can gather light from 
a 10°-wide patch of sky onto the photon coun- 
ters. The team is building two observatories, 
each an array of 80 telescopes with lenses 
50 centimeters across, bunched together in 
a fly’s eye arrangement. The plan is to site 
the pair 1 kilometer apart—to help root out 
false positives—at the Palomar Observatory 
in California. Funded by Qualcomm co- 
founder Franklin Antonio, the project has 
built five telescopes but has been stalled by 
the COVID-19 pandemic. 


THEN AGAIN, even intelligent aliens might 
be too busy or too shy to send messages to 
the stars. So SETI researchers also hope to 
detect passive signs of technology. People’s 
ideas about what to look for often reflect 
their time: Consider the 19th century “dis- 
covery” of canals on Mars when canals 
were still a common form of transport 
on Earth. In 1960, amid rapid economic 
growth and concerns about energy short- 
ages, physicist Freeman Dyson imagined 
an advanced society might build a mega- 
structure surrounding a star to capture its 
energy (Science, 3 June 1960, p. 1667). Such 
“Dyson spheres” continue to fascinate and 
were suggested as an explanation for the 
strange dimmings of the star KIC 8462852, 
known as Tabby’s Star. In 2015, Jason 
Wright led a search for the glow of Dyson 
spheres in 100,000 nearby galaxies, using 
data from NASA’s Wide-field Infrared Sur- 
vey Explorer satellite. 

Technosignatures could be more subtle. In 
the not-too-distant future, ultrasensitive ra- 
dio telescopes might be able to pick up the 
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Shelley Wright (top) is using the wide light-gathering power of Fresnel lenses, similar to those used in 
lighthouses, to search for alien laser signals. Two are being tested at Lick Observatory in California (bottom). 


beams of a radar, like the ones used for air 
traffic control, from a distant exoplanet. Fu- 
ture optical telescopes might reveal the glow 
of a city’s lights or its infrared warmth. Heavy 
industry or geoengineering might leave 
fingerprints in a planet’s atmosphere. 

These efforts chime with searches for bio- 
signatures, detectable marks that organic 
life might leave on an exoplanet (Science, 
3 November 2017, p. 578). “The line be- 
tween technosignatures and biosignatures 
is blurring,” Sheikh says. “It makes sense to 
observe both.” In deciding to fund the 2018 
workshop on technosignatures, NASA felt 
that they could be discussed “on a firmer 
scientific foundation than before,” says 
Michael New, the agency’s deputy associate 
administrator for research. After the work- 
shop, the wording in NASA funding calls 
that had for some years excluded SETI- 
related proposals quietly disappeared. 

In June, Jason Wright and his colleagues 
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benefited from the new openness when 
they were awarded a grant to model exo- 
planet atmospheres and put together a 
“library” of potential technosignatures, 
which astronomers can refer to when ob- 
serving exoplanets. The team will first 
model chlorofluorocarbons—a_ pollutant 
that isn’t produced naturally—and vast so- 
lar power arrays, because they would leave 
an obvious cutoff in the ultraviolet part of 
the spectrum. “What we should look for is 
things that can’t be avoided, civilization’s 
manifestations in the biosphere,’ says 
Adam Frank, lead investigator on the grant 
at the University of Rochester. 


BUT EVEN AFTER the fanfare of Breakthrough 
Listen, SETI remains far from a central con- 
cern for most astronomers. In 2018, panels of 
researchers convened by the National Acad- 
emies of Sciences, Engineering, and Medi- 
cine (NASEM) drew up strategies for NASA 
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on astrobiology and exoplanets. They made 
scant mention of technosignatures and 
didn’t advise NASA to spend any money on 
the topic, or, more generally, SETI. 

SETI enthusiasts say they are trying 
to avoid being shut out of an even bigger 
NASEM effort: its decadal survey of astro- 
physics, a once-a-decade priority setting 
exercise that is influential with funding 
agencies and legislators. The survey is due 
to report early next year. “We’ve made a big 
push to get the decadal survey ... to explic- 
itly say that NASA and the NSF need to nur- 
ture this field,” Jason Wright says. He and 
colleagues made nine submissions, known 
as white papers, to the survey, compared 
with a single white paper in the previous 
survey. Sheikh says: “There are signs the 
winds are starting to shift.” 

But many astronomers think the more 
important hunt is for alien life of a more 
basic kind, not the higher risk search for 
technological societies. “We have to invest in 
general questions,’ says Charbonneau, who 
co-chaired the NASEM panel that developed 
the NASA exoplanet hunting strategy. “If we 
just go for the prize and don’t find anything, 
what have we learned from that?” 

Mainstream astrobiologists hope the 
decadal survey will give a thumbs up to the 
Large UV/Optical/IR Surveyor, or LUVOIR, a 
proposed NASA space telescope as much as 
six times wider than the Hubble Space Tele- 
scope (Science, 14 December 2018, p. 1230). 
It would scrutinize habitable planets for 
biosignatures and estimate the fraction of 
them that support life—another term in the 
Drake equation. “The progress we’ve made 
as scientists follows the terms of the Drake 
equation in order,’ says astrobiologist Shawn 
Domagal-Goldman of NASA Goddard Space 
Flight Center. “That progress could lead to a 
search for technosignatures. I could see LU- 
VOIR being used to do that, even though it 
wasn’t designed for such a search.” 

Jason Wright, however, thinks the poten- 
tial payoff of SETI is just too tempting to 
put off the search. In July, he and his col- 
leagues reported the “discovery space’”—all 
the possible locations, frequencies, sensi- 
tivities, bandwidths, timings, polarizations, 
and modulations—that SETI radio surveys 
have so far explored. The result: If the en- 
tire discovery space is represented by the 
world’s oceans, SETI has so far searched the 
volume of a hot tub. 

Milner seems ready to support at least a 
few more SETI hot tubs. He says he wants 
Breakthrough Listen to continue past 2025, 
when his initial funding runs out. “It’s one 
of the most existential questions in our uni- 
verse,” he says. “Just knowing we are not 
alone ... is something that can bring us to- 
gether here on Earth.” 
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A reading list for uncertain times 


From an incisive ethnography of predictive policing to a compelling 
indictment of technology-enabled learning tools, the books on this 
year’s fall reading list offer valuable context to the myriad challenges 


currently facing humanity. Dive deep into a public health disaster 
shrouded in secrecy, sit with the uncomfortable questions raised by 

a fictional foray into the future of intimacy, confront the challenges 

to sustainable development posed by environmental racism, and learn 
what a QR-coded chicken in rural China portends about the future 

of agriculture. When you are through, sit back and marvel at the odds 
stacked against humanity from the start with an entertaining romp 
through evolution and then leave your earthly worries behind with 


an ambitious tour of the Solar System. 


A Series of 
Fortunate Events 


Reviewed by Ivor Knight! 


Through a series of chance events, the patho- 
gen we now know as severe acute respiratory 
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—Valerie Thompson 


syndrome coronavirus 2 emerged in 2019 
and infected millions of humans within a 
span of 6 months. But chance has driven 
more than just the planet’s latest pandemic. 
In his new book, A Series of Fortunate Events: 
Chance and the Making of the Planet, Life, 
and You, Sean B. Carroll takes readers on 
an entertaining tour of biological discovery 
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that emphasizes the dominant role played by 
chance in shaping the conditions for life on 
Earth. Along the way, he provides insights 
and humor that make the book a quick, lively 
read that both educates and entertains. 

Carroll begins with one of the most con- 
sequential chance events to have occurred 
in the history of our planet: the Cretaceous- 
Paleogene asteroid impact on the Yucatan 
Peninsula that resulted in the extinction 
of the dinosaurs and expansion of mam- 
mals. Given Earth’s rotational speed, if 
the asteroid had hit 30 minutes earlier or 
later, scientists believe it would have made 
a much less consequential impact, land- 
ing in either the Atlantic or Pacific Ocean. 
If that had happened, there might still be 
dinosaurs today, but no humans. As he 
does throughout the book, Carroll com- 
pares the example from science with an 
example from popular culture, describing 
the comedian Seth MacFarlane’s good for- 
tune to have narrowly missed (by 30 min- 
utes) one of the flights that was hijacked on 
11 September 2001. 

Fundamental topics such as the roles 
that mutation and natural selection play 
in the evolution of diverse life-forms, the 
genetics of human reproduction, cellular 
mechanisms of acquired immunity, and 
the development of cancer are all treated 
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within a framework where chance domi- 
nates. Carroll explains in detail how chance 
creates the genetic diversity upon which 
natural selection acts and results in the 
richness of species on Earth, as well as how 
random combinations among just 163 gene 
segments make possible a human immune 
system that can produce up to 10 billion 
different antibodies. Readers will likely be 
particularly interested to learn that their 
genome is only one of the 70 trillion pos- 
sibilities that could have been produced by 
their parents. 

Written in a conversational style, the 
book reads like an updated version of 
Jacques Monod’s 1970 Chance and Neces- 
sity that speaks directly to the reader, 
making complex subject matter more ac- 
cessible. There is also a suggested reading 
list and an extensive bibliography included 
for further exploration. 

Carroll’s central argument, that we are all 
here by luck, is certainly clear and compel- 
ling. What we choose to do with that luck, 
however, is where things really get interest- 
ing. Books such as this remind us to make 
our unlikely time here count. 


A Series of Fortunate Events: Chance and the 
Making of the Planet, Life, and You, Sean B. Carroll, 
Princeton University Press, 2020. 224 pp. 
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Unsustainable 
Inequalities 


Reviewed by Gillian Bowser? 


Does a hurricane discriminate between the 
wealthy and the poor? Do earthquakes target 
specific victims? How does systemic racism 
influence development goals? In academic 
explorations of sustainable development and 
environmental responsibilities, our assump- 
tions about the relationship between income 
and energy consumption remain largely 
rooted in the idea that social inequalities 
decrease as countries develop, thus reducing 
environmental inequality. No such relation- 
ship appears to actually exist. 

In his sobering but essential new book, 
Unsustainable Inequalities, economist Lucas 
Chancel explores the intersections of social 
justice and environmental sustainability 
with a focus on global goals established at 
the 2012 United Nations Conference on 
Sustainable Development, which informed 
the underlying philosophy of the 2015 Paris 
Agreement of the United Nations Framework 
Convention on Climate Change (UNFCCC) 
(1). Framing his narrative through the lens 
of intragenerational economic inequalities, 
he identifies social inequality as a core driver 


Published by AAAS 


of environmental unsustainability that leads 
to a vicious circle wherein the rich consume 
more and the poor lose access to environ- 
mental resources and become increasingly 
vulnerable to environmental shocks. 

In 1987, the World Commission on Envi- 
ronment and Development issued a report 
called “Our Common Future” that defined 
sustainable development as “development 
that meets the need of the present without 
compromising the ability of future genera- 
tions to meet their own needs” (2). The idea 
of intergenerational environmental equity 
became a cornerstone concept, shifting cli- 
mate policy toward the common but differ- 
entiated responsibilities enshrined in the 
UNFCCC. Yet questions about intergenera- 
tional responsibility and the equitable im- 
pacts of climate change and environmental 
degradation remain. Environmental racism, 
wherein communities of color are dispro- 
portionately exposed to environmental risks, 
is inseparable from social justice, Chancel 
argues, and the attainment of sustainable 
development that also protects the environ- 
ment across generations is “extremely dif- 
ficult” without first addressing economic 
inequality within a single generation. 

The notion that we may be able to at- 
tain sustainable development and achieve 
equal responsibility for environmental deg- 
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radation feels more unreachable than ever 
in a world upended by a global pandemic. 
In prepandemic times, many nations had 
already failed to implement or participate 
in local and global environmental justice 
efforts, and taxation schemes to level re- 
sponsibilities for environmental pollution 
have proven wildly unpopular. And while 
Chancel argues that common _ indicator 
frameworks such as the United Nations’ 
Sustainable Development Goals encourage 
nations to learn from one another, the con- 
tinued rise of social inequality is a stark re- 
minder of the difficult road ahead. 


REFERENCES AND NOTES 


1. Paris Agreement to the United Nations Framework 
Convention on Climate Change, 12 December 2015, TIAS 
No. 16-1104. 

2. World Commission on Environment and Development, 
Our Common Future (Oxford Univ. Press, 1987). 


Unsustainable Inequalities: Social Justice and 
the Environment, Lucas Chancel, Malcolm DeBevoise, 
translator, Belknap Press, 2020. 184 pp. 


Failure to Disrupt 


Reviewed by Kanwal Singh? 


As the pandemic forces so many school 
systems and learning institutions to move 
online, the desire to educate students 
well using online tools and platforms is 
more pressing than ever. But as Justin 
Reich illustrates in his new book, Failure 
to Disrupt, there are no easy solutions or 
one-size-fits-all tools that can aid in this 
transition, and many recent technologies 
that were expected to radically change 
schooling have instead been used in ways 
that perpetuate existing systems and their 
attendant inequalities. 

The first half of the book discusses the brief 
histories, limited successes, and challenges of 


three types of large-scale technology-driven 
learning environments: instructor-guided, 
such as lectures taught through massive 
open online courses (MOOCs); algorithm- 
guided (e.g., Khan Academy); and peer- 
guided (e.g., the online coding community 
known as Scratch). Reich gives a solid ac- 
counting of the conditions needed for suc- 
cess with these models, the difficulties and 
limitations involved in adopting them in 
K-12 schooling, and the challenges that 
arise when we attempt to compare different 
approaches to one another. He argues that 
although we might think that the availabil- 
ity of a technology is its biggest limiter, the 
truth is that educational systems are simply 
not constructed to allow for experimentation 
and new ways of learning. 

Reich describes himself as committed to 
“methodological pluralism.” He supports 
the use of an array of learning tools and 
mechanisms, although he confesses to a 
particular admiration for peer-guided en- 
vironments. He argues, however, that the 
incentive structures in formal education 
do not encourage the more innovative and 
deeper learning that can blossom in these 
environments. If we insist on maintaining 
current methods of assessment and rank- 
ing, which center on individual achieve- 
ment, then peer-guided instruction will 
remain relegated to the sidelines. 

The second part of the book expands 
on the challenges of implementing educa- 
tional technologies. Reich’s main argument 
here is that educational systems are inher- 
ently conservative and that change will 
happen, albeit slowly and incrementally, 
only if technology designers, teachers, and 
administrators work in partnership to un- 
derstand the desired learning goals and 
the parameters that define and constrain 
the learning environments. 


Our current educational systems, many of which have recently gone virtual, are not built for experimentation. 
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One of the most intractable pieces of the 
educational technology puzzle is the need 
to effectively conduct large-scale assess- 
ment, especially when the skills being as- 
sessed are not things that computers can 
do. Here, Reich cites a humorous example 
of an automated grading system giving 
high marks to an essay that begins with 
the technically grammatically correct sen- 
tence: “Educatee on an assassination will 
always be a part of mankind.” 

At the end of the book, Reich offers four 
questions that he finds especially useful 
to consider when examining a new large- 
scale educational technology. Perhaps the 
most useful question is the first: “What’s 
new?” Despite what “edtech evangelists” 
might claim, new technologies often have 
closely related ancestors that can help 
predict their success, he argues. In the 
end, however, new technologies alone are 
unlikely to have a substantial impact on 
schooling. We must also be open to chang- 
ing educational goals and expectations 
according to the possibilities offered by 
emergent technologies. 


Failure to Disrupt: Why Technology Alone Can’t 
Transform Education, Justin Reich, Harvard University 
Press, 2020. 336 pp. 


Blockchain 
Chicken Farm 


Reviewed by Arti Garg* 


In Blockchain Chicken Farm, Xiaowei Wang 
reveals the myriad ways that technology 
is transforming our lives. They unveil, for 
example, the unexpected connections that 
exist between industrial oyster farming in 
rural China, livestream-fueled multilevel 
marketing schemes in the United States, 
and the app-enabled gig economy in which 
Chinese influencers participate. Following 
the threads of places and people woven 
together by new technologies, Wang helps 
readers trace the patterns emerging in the 
tapestry of our tech-infused world. 

Each chapter provides a view into not just 
how we use technology but why and to what 
end. Emphasizing the often-hidden human 
engine that powers our app-driven economy, 
Wang exposes the flaw in our tendency to 
conflate societal and cultural aspirations 
with the promises of technology and chal- 
lenges us to honestly measure what value 
technology delivers. In the 21st century, they 
argue, we demand that technologists solve 
the problems that our governments and 
communities have not. In doing so, we inad- 
vertently empower companies to exploit and 
amplify those same problems. 
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Farms, like this one in China’s Hebei Province, must overcome growing distrust in the food chain. Some have turned to tracking livestock with blockchain technology. 


Most of Wang’s vignettes relate to Chi- 
nese agriculture. This decision, which 
roots the narrative in the visceral lan- 
guage of human sustenance, grounds the 
heady subject matter. The titular example 
takes readers to the GoGoChicken farm in 
Sanqiao, a “dreamlike” village that sits in 
one of the poorest regions in China. Here, 
Wang introduces the straw-hatted “Farmer 
Jiang,” who has partnered with his village 
government and a blockchain company to 
sell free-range chickens via an e-commerce 
site. Jiang’s chickens sell for RMB 300 
(~$35) each, an amount equal to 6% of the 
average annual household income in that 
part of China. Wang explains that high- 
profile failures of regulatory oversight have 
left many Chinese with a deep distrust of 
the food supply chain and that upper-class 
Chinese urbanites will pay a premium 
for reassurance about food safety, which, 
in this case, takes the form of a vacuum- 
sealed chicken that comes with a QR code 
revealing blockchain-logged details of its 
life on the farm. 

Wang suggests that Americans, driven by 
concerns over animal welfare, may desire 
similar reassurance about their food’s prov- 
enance. In both China and America, they 
observe, technology allows the upper class 
to buy its way around governmental and so- 
cietal shortcomings at prices that are out of 
reach for most people. Technology does not 
correct the intrinsic problems, and most 
cannot reap the benefits of the technologi- 
cal “solutions.” 

Without resorting to an overly romanti- 
cized notion of rural wisdom, Wang treats 
individuals like Jiang, whose future re- 
mains uncertain owing to the vagaries of 
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e-commerce supply chains, with respect 
and empathy. Because of this, they largely 
succeed in their goal of reframing our un- 
derstanding of technology as neither the 
cause of nor the solution to our problems 
but rather as a force reshaping the human 
experience in fundamental ways. 


Blockchain Chicken Farm: And Other Stories of 
Tech in China’s Countryside, Xiaowei Wang, Farrar, 
Straus and Giroux, 2020. 256 pp. 


The Secret Lives 
of Planets 


Reviewed by Heather Bloemhard® 


The Secret Lives of Planets by Paul Murdin 
includes a plethora of information about 
our Solar System. Murdin covers planets, 
asteroids, moons, dwarf planets, and more, 
approximately one per chapter. Even exo- 
planets—the planets that orbit a star other 
than our Sun—are referenced frequently, 
although not in their own chapter. Using 
only a few images, Murdin illustrates the 
historical and physical concepts that sur- 
round each of these elements in prose 
peppered with anecdotes from his own 
career as an astronomer. 

While the book’s tone is pleasant and 
conversational, the discussions are often 
technical in nature, and I worry that some 
readers may be frustrated by its many tan- 
gents and loose organizational structure. 
For example, in his discussion of the for- 
mation of Mercury, Murdin references the 
formation of exoplanets, the discovery of 
‘Oumuamua, and Earth’s fossil record. The 
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same chapter also refers to Earth and Ve- 
nus to help explain orbital eccentricity and 
precession, but this analogy may fall short 
for lay readers. 

I was also disappointed that Murdin re- 
lied almost exclusively on the accomplish- 
ments of European men to tell the story of 
how our understanding of the Solar System 
emerged over time. He writes, for example, 
of Nicolaus Copernicus’s revelations about 
the geometry of our solar system but ne- 
glects the work of Muslim astronomers 
who developed models of heliocentric or- 
bits hundreds of years earlier. Murdin is 
far from alone in this misstep, but it is well 
worth striving to do better. 

Despite these criticisms, every reader 
will learn something from this ambitious 
book. Did you know, for example, that some 
scientists once believed there were oases of 
vegetation on Mars, or that others believed 
that martians might try to colonize Earth? 
From the exchange of planetary material 
by way of meteorites to the formation of 
asteroids, Murdin covers a wide range of 
astronomical topics, including the aurora 
of Jupiter, the mysteries of Uranus, and the 
potential of the moons of Jupiter and Sat- 
urn to support recognizable life. 

I found Murdin’s personal recollections 
to be the most compelling feature of The 
Secret Lives of Planets. He tells the story 
of how, as a student, he observed the shad- 
ows cast by the tops of clouds of different 
heights on Venus using a telescope similar 
to the one used by Galileo and uses this 
anecdote as a starting point to explain 
what the Italian astronomer discovered 
about the planet. Recounting the time he 
observed the launch of Cassini-Huygens, 
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a probe sent to Saturn’s moon Titan, Mur- 
din explains what scientists had hoped 
to learn from this mission and what they 
ended up discovering. He also discusses at- 
tending the 2006 International Astronomi- 
cal Union conference, where a debate was 
held about the definition of a planet, and 
reveals what it was like to cast a vote on 
the final decision. 

In the end, there is much to recommend 
The Secret Lives of Planets as an introduc- 
tory text on our solar system. 


The Secret Lives of Planets: Order, Chaos, 
and Uniqueness in the Solar System, Pau! Murdin, 
Pegasus, 2020. 288 pp. 


The Great Secret 


Reviewed by Peter Reczek® 


Modern cancer therapies are often the 
result of years of targeted research and 
development, making it easy to forget that 
many of the field’s early breakthroughs 
had as much to do with chance as they 
did with preparation. In The Great Secret, 
Jennet Conant recounts one such break- 
through, which was made in the wake of a 
deadly disaster. 

Conant’s engrossing story is set in the Ital- 
ian port town of Bari, which was used as an 
important staging area for the distribution 
of supplies supporting Allied troops as they 
pushed north through Italy during World 
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A port official surveys the aftermath of the 1943 air raid on Bari, Italy. Injuries were exacerbated by a secret cache 


War II. On 2 December 1943, a day that 
would later be referred to as “a little Pearl 
Harbor,’ German military aircraft sank more 
than 20 Allied ships anchored in Bari, lead- 
ing to the loss of more than 1000 Allied ser- 
vicemen and Italian civilians. 

Lieutenant Colonel Stewart Alexander, a 
medical officer attached to General Eisen- 
hower’s headquarters in North Africa, was 
sent to coordinate medical relief efforts. In 
Bari, Alexander found “a nightmarish scene.” 
In the aftermath of the air raid, “The walk- 
ing wounded staggered in [to the hospital] 
unaided, suffering from shock, burns, and 
exposure after having been in the cold water 
for hours before being rescued. Others had to 
be supported, as they cradled fractured arms 
in improvised slings or dragged mangled 
limbs...Almost all of them were covered in 
thick, black crude oil,’ writes Conant. In ad- 
dition to the acutely injured, Alexander dis- 
covered victims whose injuries had emerged 
days after the attack and could not be attrib- 
uted to the percussive effects of the bombing. 

After analyzing the positions of the ailing 
seamen, Alexander reported that an Ameri- 
can Liberty ship, the John Harvey, was the 
source of the problem, speculating that it 
likely contained a secret cache of nitrogen 
mustard (i.e., mustard gas). Both the Ameri- 
can and British governments denied any 
such cache, but Conant reveals that Alexan- 
der persisted, and his controversial report— 
which, crucially, documented a decrease in 
white blood cell counts in the victims—was 


of mustard gas on the John Harvey Liberty ship. 


accepted by the Allied High Command with 
a classification of “Secret.” 

After the war, Colonel C. P. “Dusty” 
Rhoads, who had been Alexander’s supe- 
rior during the Bari investigation, reasoned 
that an agent that reduced white blood cells 
might be useful in treating some forms of 
leukemia. While serving as the first direc- 
tor of the Sloan Kettering Institute, Rhoads 
oversaw a clinical trial to test nitrogen 
mustards as potential therapeutic agents 
for the treatment of neoplastic disease. The 
results exceeded expectations. “In their first 
attempt to treat patients with inoperable 
lung cancer with nitrogen mustard, the Me- 
morial team reported that of the thirty-five 
patients, 74 percent showed some clinical 
improvement” writes Conant. Many similar 
compounds, collectively known as alkylat- 
ing agents, are still the foundation of the 
combination chemotherapy used to treat 
some forms of leukemia. 

Drawing largely from archival research, 
Conant relies on a loose conversational 
style to convey a fast-paced medical detec- 
tive story that demonstrates how careful 
scientific observation can yield unexpected 
benefits and serves as a reminder of the 
difficult choices made by governments to 
balance public health and secrecy in mat- 
ters of security. 


The Great Secret: The Classified World War II 
Disaster That Launched the War on Cancer, 
Jennet Conant, Norton, 2020. 400 pp. 
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Entanglements 
Reviewed by Esha Mathew’ 


In quantum physics, entanglement is a 
property wherein two particles are inex- 
tricably linked. Put another way, entangled 
particles are never truly independent of 
each other, no matter the distance between 
them. It is fitting then that Entanglements 
is an anthology of short stories about inex- 
tricably linked people and the impact of 
emerging technologies on their relation- 
ships. A talented set of authors, with deft 
editing by Sheila Williams, explore the 
full spectrum of intimacy and technology 
to great effect. As an added visual treat, 
illustrations by Tatiana Plakhova punc- 
tuate each story with a blend of science, 
mathematics, and art that complements 
the subject matter. 

Even with the length limitations of a 
short story, the world-building in this 
compilation is frequently full and often 
insidiously terrifying, particularly in those 
stories that use the familiar as bread- 
crumbs to lure the reader in. The very first 
tale, “Invisible People” by Nancy Kress, be- 
gins with a mundane morning routine and 
carefully layers in a story about two par- 
ents reeling from an unsanctioned genetic 
experiment on their child. In “Don’t Mind 
Me,” Suzanne Palmer uses the shuffle be- 
tween high school classes as a foundation 
on which to build a story about how one 
generation uses technology to enshrine its 
biases and inflict them on the next. The 
ethical implications in these stories of- 
fer fodder enough for plenty of late-night 
discussions. It is also chilling how entirely 
possible many of the fictional futures seem. 

But looking forward need not always be 
bleak. This volume balances darker-themed 
stories with those in which technology and 
people collide in uplifting and charming 
ways. In Mary Robinette Kowal’s “A Little 
Wisdom,” for example, a museum curator, 
aided by her robotic therapy dog-cum- 
medical provider, finds the courage within 
herself to inspire courage in others and save 
the day. Meanwhile, in Cadwell Turnbull’s 
“Mediation,” a scientist reeling from a ter- 
rible loss finally accepts her personal AI’s 
assistance to start the healing process. And 
in arguably the cheekiest tale in this com- 
pilation, “The Monogamy Hormone,” An- 
nalee Newitz tells of a woman who ingests 
synthetic vole hormones to choose between 
two lovers, delivering a classic tale of rela- 
tionship woes with a bioengineered twist. 

With such a dizzying array of technologies 
discussed in relation to a range of human 
emotion and behavior, readers may experi- 
ence cognitive whiplash as they move from 
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one story to the next. But it is definitely 
worth the risk. The 10 very different thought 
experiments presented in this volume make 
for a fun ride, revealing that human relation- 
ships will continue to be as complicated and 
affirming in the future as they are today. I 
would recommend the Netflix approach to 
this highly readable collection: Binge it in 
one go, preferably with a friend. 


Entanglements: Tomorrow’s Lovers, Families, and 
Friends, Sheila Williams, editor, MIT Press, 2020. 240 pp. 


ee 
Therapy robots, like this plush seal, are moving out 
of the realm of science fiction and into reality. 


Predict and Surveil 


Reviewed by Joseph B. Keller® 


The U.S. police system is experiencing a 
reckoning. Protesters across the country 
(and around the world) have taken to the 
streets, arguing that police brutality dispro- 
portionately harms minority communities, 
and the current value of policing is being 
debated by city councils, lawmakers, and 
members of the news media. Into this 
tumultuous context enters Sarah Brayne’s 
book, Predict & Surveil: Data, Discretion, 
and the Future of Policing. 

A sociologist by training, Brayne synthe- 
sizes interview data and field notes from 5 
years of observation within the Los Angeles 
Police Department, employing a firsthand 
ethnographic approach to reveal how big 
data are currently used in tech-forward po- 
lice departments in America. She chronicles 
both consequential and mundane interac- 
tions between officers, civilians, and data. 
For example, she documents officers upload- 
ing license plate numbers, field interview 
notes, traffic citations, and potential gang 
affiliations onto a private industry data plat- 
form, as well as their active surveillance of 
hotspots in Los Angeles predicted to be crim- 
inogenic. This fly-on-the-wall perspective 
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captures the human aspect of a police force 
grappling with automated systems and ma- 
chine-learning decisions in real time, juxta- 
posing the experiences of individual officers 
with institutional directives being handed 
down from administrators and lawmakers. 

Many police departments contend that the 
adoption of predictive analytics can improve 
objectivity and transparency, reduce bias, 
and increase accountability. Yet Brayne’s 
book reveals how few of these metrics ac- 
tually improve with predictive policing and 
exposes the scant evidence that supports the 
idea that it reduces crime rates. On the con- 
trary, she insists, predictive policing raises 
glaring civil rights concerns and reinforces 
harmful racial biases. We all leave digital 
traces throughout our daily lives, and in- 
nocent people can be caught in the dragnet 
and cataloged in a digital criminal justice 
system, where a case can be built from be- 
nign data. Police unions, Brayne notes, often 
vehemently oppose the tracking of their own 
officers. She records incidents of officers 
turning off their car locator signals, for ex- 
ample, as well as other tactics used to thwart 
tech-infused managerial oversight. 

Many officers view policing as an art form 
rather than a scientific system that can be op- 
timized. To some, big data policing threatens 
their sense of police instincts and identity. 
“They worry that they will become nothing 
more than line workers and insist that their 
years of accumulated experiential knowledge 
is irreplaceable,” observes Brayne. 

Brayne’s book raises timely issues rel- 
evant to mass surveillance and policing 
amid a growing debate about facial recog- 
nition systems, which makes their omission 
from this work notable. Although banned 
in several major American cities, these sys- 
tems remain a common tool for identifying 
potential offenders, despite abundant evi- 
dence of dangerous inconsistencies. 

Predictive policing can drive societal in- 
equalities, but Brayne suggests that reduc- 
ing instances of general police contact may 
mitigate disparities. In addition to offering 
immediate recommendations for changing 
law enforcement in the digital age, she as- 
serts that effective programmatic reforms 
are typically influenced by external social 
organizing and guided by communities. 
(The likelihood of real transformation from 
within the police system is small, she be- 
lieves.) For judicial and policing institutions 
genuinely seeking reform, this book pro- 
vides powerful observations and analysis 
that suggest how we can begin. 


Predict and Surveil: Data, Discretion, and the 
Future of Policing, Sarah Brayne, Oxford University 
Press, 2020. 224 pp. 
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HUMAN GENOMICS 


Searching for sex differences 


Evolved sex differences in gene expression are pervasive, but so too is sampling bias 


By Melissa A. Wilson 


he behemoth effort, started a decade 

ago, known as the Genotype-Tissue 

Expression (GTEx) Consortium aims 

to discover how DNA variation affects 

gene expression across human tissues 

C1, 2). As part of this consortium, on 
page 1331 of this issue, Oliva et al. (3) find 
that more than one-third of genes show 
sex-biased expression in at least one tissue. 
Four other GTEx studies, on pages 
1318, 1334, 1333, and 1332 of this 
issue, respectively, discuss the ef- 
fects of gene regulation in human 
tissues (4), identify functional 
rare genetic variation (5), study 
predictors of telomere length (6), 
and report cell type-specific gene 
regulation (7). What is especially 
notable about Oliva et al. is the 
careful analysis, which revealed 
that in addition to reported ge- 
netic and hormonal effects (8), 
there are cell type-specific sex 
differences in tissue composi- 
tion. Furthermore, their work 
highlights that rather than being 
strictly dimorphic, interindividual 
variation results in overlapping 
distributions of gene expression 
between the sexes. 

It has been hypothesized that 
selection shaped sex differences 
in immune function in response 
to the evolution of pregnancy and 
the placenta in mammals, begin- 
ning more than 90 million years 
ago and contributing to the ob- 
served sex differences in diseases 
today, including a female bias in 
autoimmune disease and male bias in most 
cancers (9). Sex differences in gene expres- 
sion are broadly shared across mammals, 
but their role in shaping sex differences in 
disease etiology has not been thoroughly 
explored. Oliva et al. report that genes that 
show differences between sexes are enriched 
for multiple pathways, including in immune 
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responses and cancer. Furthermore, they 
identify sex differences in a cluster of genes 
that target histone H3 lysine 27 trimethyl- 
ation (H3K27me3) sites; these histone marks 
have also been reported to show sex-differen- 
tiated expression in the placenta (J0). Oliva 
et al. provide a comprehensive baseline for 
sex differences in gene expression in unaf- 
fected tissues that can be used for future 
comparisons with diseased tissues. These 
observations may also inform about which 


Sex differences in gene expression vary across the genome and 
between individuals, as represented by these heatmaps. 


pathways are most important in sex differ- 
ences in disease etiology and aid in the de- 
velopment of targeted therapies. 

Oliva et al. identified hundreds to thou- 
sands of genes (1.3 to 12.9% of the genes 
expressed per tissue) that show sex differ- 
ences in gene expression in any given tissue 
but found that the effect for each individual 
gene is subtle (the median fold change in ex- 
pression was just 1.04). This is after account- 
ing for the cellular composition of tissues 
that came from males versus females. The 
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authors hypothesize that this sex difference 
in cell type composition—particularly of im- 
mune-related cells, such as monocytes and 
neutrophils [previously reported by (1D)]— 
may contribute to the underlying sex-specific 
dysregulation for some diseases. These sex 
differences in cell type composition affect 
estimates of gene expression and, if unac- 
counted for, can skew results that compare 
groups with unequal sex ratios. Future stud- 
ies that include samples from males and fe- 
males will now need to account 
for cell type composition in addi- 
tion to sex chromosome comple- 
ment and hormonal environment. 
This is because different sex ratios 
in cell type in cases versus con- 
trols may drive the gene expres- 
sion signal more than the pheno- 
type of interest. 

Although the genes with the 
highest fold change in expression 
were found on the X chromosome, 
the X chromosome contains only 
4% of genes with sex-differential 
expression; the remaining 96% 
are spread across the genome (3). 
This is important because the X 
chromosome is often excluded 
from genome-wide analyses (12), 
but in doing so, studies may be 
missing genes with the largest ef- 
fects. Additionally, Oliva et al. call 
attention to the importance of au- 
tosomal (non-sex chromosome) 
gene regulation in contributing to 
sex differences in humans. Given 
this, it is also noteworthy that 
they show that sex-biased auto- 
somal gene expression is not very 
specific for predicting the sex 
of the donor from which the sample was 
taken (84% accurate, with 56% specificity), 
emphasizing how labile sex-biased gene ex- 
pression is across people. 

The GTEx Consortium has generated an 
invaluable resource through the generous 
involvement of patients and their families. 
However, like many consortia, sampling bi- 
ases hinder investigation of interindividual 
variation. Details about the sampling are, 
with much appreciation, made transparent 
by the consortium on the GTEx Portal (73). 
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Just over two-thirds (67.1%) of the samples 
are from males. This means that studies are 
unevenly statistically powered to detect sex 
differences. It also means that studies that 
use the GTEx data as a reference set for 
comparison with disease state, for example, 
should take into account the relative propor- 
tion of samples from males and females (in 
both sets) because the relative sex effects on 
gene expression may not be the same in both. 

Also, more than half of the samples come 
from people 50 years and older. This means 
that the samples are skewed toward under- 
standing gene expression in tissues that have 
had many different exposures, potentially 
contributing to the observed interindividual 
variation, and does not reflect expression 
of tissues across the life span. Considering 
variation across the life span is especially 
critical for understanding how puberty and 
menopause, for example, affect gene regula- 
tion between the sexes. 

Last, representation of global human 
genetic variation is low, with nearly 85% 
of samples collected from white people of 
European descent. There is a dearth of in- 
formation about genetic variation and gene 
expression outside of a narrow range of re- 
cent genetic ancestries (14). This is critical 
for human health because inferences about 
genetic risk from one group of people with 
recent shared ancestry often do not general- 
ize to others (15). 

Given these limitations of the samples, it 
is even more surprising—and should be mo- 
tivating to human geneticists—how much 
interindividual variation is observed in 
gene expression among the people included 
in the GTEx Consortium. This should be a 
call to projects to expand the representation 
of human variation in future studies. 
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SEISMOLOGY 


Quiet Anthropocene, quiet Earth 


Seismic noise levels that correlate with human activities 
fell when pandemic lockdown measures were imposed 


By Marine A. Denolle! and Tarje Nissen-Meyer? 


ur planet vibrates incessantly, some- 

times with notable but more of- 

ten with imperceptible intensity. 

Conventional seismology attempts to 

decipher vibrational sources and path 

effects by studying seismograms— 
records of vibrations measured with seis- 
mometers. In doing so, scientists seek either 
to understand the tectonic processes that lead 
to strong ground motions and earthquake 
failure (1) or to probe otherwise inaccessible 
planetary interiors (2). Progress in these ar- 
eas of research typically has relied on the rare 
and geographically irregular occurrence of 
large earthquakes. However, anthropogenic 
(human) activities at Earth’s surface also gen- 
erate seismic waves that instruments can de- 
tect over great distances. On page 1338 of this 
issue, Lecocq et al. (3) report on a quieting 
of anthropogenic vibrations since the start of 
the severe acute respiratory syndrome coro- 
navirus 2 (SARS-CoV-2) pandemic. 

Seismology has benefited from a surge in 
seismic data volume, computational power, 
and corresponding methodological devel- 
opment. These advances have enabled seis- 
mologists to branch away from traditional 
source and subsurface characterization of 
the energy from earthquakes and human- 
made blasts. The expansion of seismic net- 
works has allowed the observation of pre- 
viously unseen natural processes as diverse 
as wildlife activity (4), bed load transport in 
rivers, glacier sliding (5), and surface-mass 
wasting (6). In particular, scientists use 
continuous, ambient seismic vibrations to 
probe volcanic activities (7) and groundwa- 
ter resources (8), to track storms (9), and to 
decipher ice sheet processes (0). 

Human cultural noise carries seismic sig- 
natures mostly at frequencies above 1 Hz, 
whether the source is transient (entertain- 
ment; individual cars, trains, or planes), har- 
monic (wind turbines, machinery), or diffuse 
(railroads, highways) (11, 12) (see the figure). 
Overall, anthropogenic seismic noise levels 
have increased over the past few decades, 
and there is a clear positive correlation be- 
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tween this increase and gross domestic prod- 
uct (13). But when the SARS-CoV-2 pandemic 
began to ravage the planet, humans—and 
Earth—went quiet. 

Through a global analysis of seismic noise 
levels, Lecocq et al. found that most sites ex- 
perienced a drastic reduction in noise levels 
in the 4- to 14-Hz frequency band. This reduc- 
tion was much greater than those observed 
during the annual noise-level cycles of na- 
tional or religious holidays. Daily CO, emis- 
sions fell only 11 to 25% (14), whereas anthro- 
pogenic vibrations dropped by 75% in most 
countries that imposed lockdown measures. 
Among countries with the greatest noise re- 
ductions were China, Italy, and France—all 
densely populated places with strong gov- 
ernment responses (that is, with high virus- 
containment indices) (75). 

Lecocq et al. also detected a correlation 
between seismic data and new types of time 
series, such as urban audible sound from 
acoustics data and cell phone mobility data. 
The authors observed the greatest correla- 
tions between seismic noise levels and two 
common types of pandemic mitigation: sur- 
face transportation and nonessential busi- 
ness activities. Lecocq et al. did not detect 
a strong correlation between lockdown and 
seismic noise reduction at other frequency 
bands, which might be explained by cer- 
tain uninterrupted human activities such as 
power generation (14). 

For all its hardships, the lockdown has un- 
locked a door to scientific inquiry into envi- 
ronmental noise and global collaboration. At 
a fundamental level, low noise benefits tra- 
ditional seismology, hence the recent noise 
decrease might open new windows of oppor- 
tunity; study areas hindered by urban noise 
might now be targets for detecting microseis- 
micity or for improved subsurface imaging. 
The crucial next step, as ever in seismology, 
is to determine the causative nature of these 
signals beyond their correlation—thus turn- 
ing anthropogenic noise into informative 
signals that allow scientists to address new 
questions. For example: Is there feedback 
between anthropogenic vibrations and Earth 
processes? And will seismic monitoring of 
anthropogenic and environmental activities 
become complementary, economically valu- 
able alternatives to conventional techniques? 
To achieve these advances, seismologists 
must develop new ways of processing data 
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and modeling and interpreting results. 

Lecocq et al. exemplify seismological 
progress through best practices in scien- 
tific research: public data, open-access soft- 
ware and hardware, global cooperation, and 
crowdsourcing of citizen-science projects. 
All of the data are publicly available through 
open-access data centers at the Incorporated 
Research Institutions for Seismology (IRIS), 
which hosts and redistributes real-time 
seismograms from most of the stations 
participating in the Federation of Digital 
Seismograph Networks archive. A large pro- 
portion of the data used in the Lecocq et al. 
study was measured on seismic instruments 
that are powered on open-source Raspberry 
Pi computers hosted by citizen scientists. The 
Raspberry Shake network counts more than 
3500 stations globally, all installed in homes, 
schools, and research institutions at 2 to 7% 
of the cost of conventional research or indus- 
trial sensors. The authors performed data 
analyses with open-source Python software 
Obspy, demonstrating the prevalence and 
usefulness of open-source community codes 
in modern science. 

Like the pandemic, the seismological 
community also is shaking up norms. One 
important example is the reorganization of 
research activities. Although physical bor- 
ders are closed, Lecocq et al. demonstrate 
that, much like the global medical research 
on SARS-CoV-2, seismological research is 
and ought to be without borders. The new 
study represents scientists from 25 countries 
on five continents, and the authors shared 


the manuscript on public editing platforms 
(Google Docs, Slack) that allowed for all 
members of the community to contribute. 
Indeed, social seismology, which directly re- 
lates human activities and seismic waves, has 
sparked enthusiasm in the scientific commu- 
nity for urban seismology. The fall meeting of 
the American Geophysical Union (December 
2020) will highlight the imminent wave of 
SARS-CoV-2-related seismological science in 
a special session called “Social Seismology.” 
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Humans and nature excite seismic waves 


Seismometers record vibrations from everything, 
not only earthquakes. Shown are sources that induce 
seismic waves of different vibration modes 
(harmonic, diffuse, transient), detectable over 
large distances. 


Anthropogenic 
activities 


Dee 


Harmonic 


VWVW 


1300 


Ocean noise 


Diffuse 


Natural hazards 


Transient 


Published by AAAS 


FERROELECTRICS 


A key piece 
of the 
ferroelectric 
hafnia puzzle 


Dipolar slices explain the 
origin of ferroelectricity 
in a material now used for 
memory devices 


By Beatriz Noheda™ and Jorge {fiiguez?4 


he ferroelectrics community is wit- 
nessing one of those moments in 
which serendipity changes the course 
of science. The story of ferroelec- 
tric hafnia (HfO,) resembles that of 
Cinderella: Not invited to the polar 
dielectrics ball, nanoscale HfO, was dis- 
missed as not being a real ferrolectric, a 
material that has a switchable spontane- 
ous polarization, despite the experimental 
evidence for this response. On page 1343 of 
this issue, Lee et al. (1) bring us closer to a 
real-life fairy tale ending with their theoreti- 
cal calculations, which show that nanoscale 
HfO, becomes a ferroelectric through a dif- 
ferent mechanism. Polarization manifests 
in the form of two-dimensional (2D) slices 
separated by nonpolar spacers, associated 
with flat polar phonon bands that allow for 
homogeneous switching of electric dipoles. 
The story starts with research that began 
in 2006 but was not published until 2011 
(2). Scientists fabricating silicon transistors 
with HfO,-based insulating layers spent sev- 
eral years trying to explain the origin of a 
strange peak observed in the capacitance- 
voltage characteristics. The peak looked 
very much like the ones observed in fer- 
roelectrics when an applied electric field 
switches the direction of the spontaneous 
polarization. This feature has made ferro- 
electrics one of the oldest nonvolatile semi- 
conductor memory types (3). 
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However, ferroelectricity was unlikely 
for two reasons: No polar phases had ever 
been reported in HfO,, a refractory mate- 
rial with a long history of research (4), and 
these HfO, layers were only a few nanome- 
ters thick. Ferroelectricity is not expected 
at the nanoscale because it is a cooperative 
phenomenon. The local dipoles in ferroelec- 
tric materials, which result from the rela- 
tive displacement of positive and negative 
ions, interact electrically with the dipoles of 
the neighboring cells and have a tendency 
to align collectively in the same direction, 
akin to what happens in a ferromagnet 
with the electron spins. The collective or- 
dering leads to a spontaneous polarization. 
However, when the dimensions of the ferro- 
electric sample are small, as needed in mi- 
croelectronics, a substantial number of di- 
poles lie on its surface. The stabilization of 
the ferroelectric phase is hampered by the 
energy cost of the depolarizing electric field 
that such dipoles create inside and outside 
the ferroelectric material, as dictated by 
Maxwell’s equations. 

In nature, this electrostatic penalty is re- 
duced by domain formation, in which regions 
with alternating polarization (up and down) 
form in the sample. In theory, compensa- 
tion of the dipolar surface charges can also 
be achieved by sandwiching the ferroelectric 
in between two metallic electrodes. The free 
carriers of the metal should screen the polar- 
ization charges and eliminate the depolariz- 
ing field, avoiding the need to form domains. 
In practice, this approach does not work per- 
fectly with real metals, and screening is not 
complete (5). How to work around this issue 
has been one of the main research focuses 
of the ferroelectrics community for more 
than 30 years, driven by the vision of a fer- 
roelectric nonvolatile memories that would 
be faster, denser, and less power-consuming 
than their magnetic counterparts (6). 

Thus, even when the paper reporting on 
ferroelectric HfO, was published (2), the 
ferroelectrics community largely dismissed 
this result as an artifact, assuming that a 
material that is not polar in bulk would not 
become polar at the nanoscale. Moreover, at 
the nanoscale, it is hard to distinguish fer- 
roelectric switching peaks from the voltam- 
metry characteristics that could arise from 
electrochemical reactions at interfaces (7). 
However, after many subsequent studies 
from several groups (8), the evidence for 
robust switching became difficult to ignore. 
The current consensus is that ferroelectric- 
like switching in HfO,-based ferroelectrics 
does exist, but its origin is still highly de- 
bated. Only one or two reports have shown 
a ferroelectric phase transition in this mate- 
rial (9, 10). In addition, switching requires 
large applied fields and does not seem to 
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Figuring out a thin ferroelectric 
The theoretical study by Lee et al. explains how an 
ultrathin material, hafnia, can be a ferroelectric by 
comparing it with a classical three-dimensional (3D) 
ferroelectric. In both cases, alternating domains of 
polarization (arrows representing ferroelectric dipoles 
along the z crystal direction) eliminate net surface 
charges and prevent depolarization. 


2D dipolar crystal 

In hafnia, the localization of dipoles in layers that are 
separated by nonpolar atomic sheets, represented by 
the gray “spacers,” allows domains to form with no 
energy cost. 


< 


Electrode 


Order localized in planes 
Free domain walls 


3D dipolar crystal 


Domain formation creates an energy penalty caused 
by the existing interaction between neighboring dipoles 
in the y direction. 


Electrode 


Cooperative order 
Costly domain walls 


proceed as in other ferroelectrics through 
movement of domain walls (17). 

How HfO, becomes ferroelectric at the 
nanoscale and how it screens polarization 
charges at surfaces are the main questions 
to resolve. The former has been explained 
by a combination of effects (surface energy, 
ordered dopants, and oxygen vacancies) 
that favor the occurrence of the polar phase 
(3, 7). The latter could be explained by the 
much lower dielectric permittivity of HfO, 
compared with other ferroelectrics, but why 
is it so low? 

Theoretical calculations by Lee et al. now 
show that ferroelectricity in HfO, is of a dif- 
ferent type (see the figure). The polar fea- 
tures of HfO, are associated with a nearly 
flat phonon band (similar frequency of the 
different modulations along the energy 
band). Thus, a homogeneous polar order, in 
which all electric dipoles align parallel as in 
a regular 3D ferroelectric phase, is as likely 
as any transversally modulated inhomoge- 
neous order in which an arbitrary sequence 
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of ferroelectric domains are separated by 
180° domain walls. Put differently, the do- 
main walls in HfO, have essentially zero en- 
ergy cost and a negligible width. 

This situation, which is reminiscent of the 
effect called pressure-induced amorphiza- 
tion (12), has two important consequences: 
HfO, has essentially 2D polar instabilities, 
meaning that a polar 2D plane (polariza- 
tion within the plane) can in principle ap- 
pear by itself, even if the rest of the material 
remains nonpolar. The polarization of such 
2D slices has a very small electrostatic pen- 
alty (depolarizing field) associated with it, 
much smaller than that for 3D polar order, 
which helps explain why ferroelectricity can 
occur in HfO, at the nanoscale. 

Also, the 2D polar slices are all but decou- 
pled from each other, so in HfO,, the switch- 
ing of one domain has no effect on its sur- 
rounding domains. Lee e¢ al. argue that this 
process must have dramatic effects in how 
ferroelectric switching proceeds in this mate- 
rial because nucleation of reversed domains 
is not followed by growth, which should yield 
very large coercive fields, as is indeed ob- 
served. The occurrence of individual switch- 
ing of 2D polar planes offers the possibility of 
multilevel polarization switching with ideally 
as many intermediate states as the number of 
unit cells. This capability is of much interest 
for adaptable electronics and brain-inspired 
computing applications. 

Lee et al. have found that a flat phonon 
band gives rise to dipolar localization, a 
phenomenon reminiscent of localization ef- 
fects for electrons, photons, and other par- 
ticles but whose implications in the case of 
ferroelectrics have not been fully explored. 
In this way, dipolar order can occur with- 
out the need for cooperative 3D behavior, 
allowing miniaturization and multivalued 
nonvolatile storage. The next step will be 
to use this knowledge to engineer lower 
switching voltages for memory applications 
in this material that is already compatible 
with silicon electronics. & 
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MEDICINE 


Modulating gut microbes 


Fecal microbiota transplant and modulation of microbial 
species show therapeutic promise 


By Jennifer A. Wargo 


here are hundreds of trillions of mi- 
crobes within the human body, which 
have a profound impact on modulat- 
ing host function. Many of these mi- 
crobes reside in the gastrointestinal 
tract and have been shown to influ- 
ence normal physiology across all body 
systems (1). Disruptions in the delicate bal- 
ance of microbes within the gut and other 
niches are associated with numerous dis- 
ease states—including neurologic 
disorders, cardiovascular disease, 
gastrointestinal disorders, and even 
cancer (2). Accordingly, there is 
intense interest in targeting these 
microbes to promote overall health 
and to abrogate disease, with con- 
siderable advances made recently. 
Strategies to modulate gut microbes 
include fecal microbiota transplant 
(FMT), which involves the transfer 
of fecal material from one individ- 
ual to another for a desired physi- 
ologic effect. This approach, among 
other gut microbiota modulation 
strategies, has shown promise in 
treating several disease conditions, 
although opportunities exist to iter- 
ate and build on these approaches. 
The idea that disruptions in 
the gastrointestinal tract could 
contribute to systemic disease 
was championed centuries ago by 
Hippocrates, a physician in ancient 
Greece. Strategies to modulate the 
composition of the gut have also 
been around for centuries, with the first 
reports of the use of FMT dating back to 
the fourth century BCE in China where 
fecal preparations were used to treat gas- 
trointestinal disorders (3). Parallels have 
also been observed in the animal king- 
dom, where coprophagia (ingesting fecal 
material) is common and may confer an 
increase in gut microbial diversity and 
associated enhancements in host func- 
tion for digestion and other physiologic 
processes. However, the first successful 
clinical application of FMT was not pub- 
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lished until 1958 with the report of FMT 
from healthy donors used for patients with 
pseudomembranous’ enterocolitis from 
Clostridioides difficile infection (CDI) (4). 
Numerous clinical trials have since been 
undertaken, using FMT and other gut 
microbiota modulation strategies to treat 
diseases of the gut (such as CDI, and in- 
flammatory bowel disease, IBD) as well as 
other systemic diseases—including meta- 
bolic syndrome, autism, multiple sclerosis, 
Parkinson’s disease, and even cancer (2). 


Strategies to alter gut microbiota 
Fecal microbiota transplant (FMT) involves transfer of fecal microbiota from 
a donor to another individual. Alternatively, microbial consortia (targeted 
formulations used to augment host microbiota) are being developed. Diet, 
prebiotics, and postbiotics can also influence the microbial community. 


Diet, pre-/postbiotics 


a 


Gut microbiota 


To date, many of the strategies to target 
gut microbes have involved the two ex- 
tremes: either transfer of entire microbial 
communities (by using FMT) or transfer of a 
single microbial taxon. However, a growing 
number of approaches are now being de- 
veloped as more is learned about the func- 
tional aspects and physiologic impact of 
microbes throughout the body. These itera- 
tive approaches transcend efforts that focus 
on taxonomic characterization of microbial 
niches through next-generation genomic 
sequencing, incorporating interrogation of 
functional characteristics of gut microbes 
(by metabolomic profiling and studies in 
preclinical models) to mediate the desired 
physiologic response. This has led to a host 
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of therapeutic strategies from microbial 
consortia to pre-, pro-, and postbiotic in- 
terventions. Nonetheless, much still needs 
to be learned to implement true “precision” 
modulation of the gut microbiota. 

When considering strategies to modulate 
the gut microbiota, the indication for inter- 
vention in the intended population must be 
considered. Gut dysbiosis, an imbalance in 
the composition of commensal microbial 
communities, has been linked to numer- 
ous disease states, substantiating the use of 
FMT and other gut microbiota modulation 
strategies (5). This link is fortified by data 
demonstrating that although there has been 
a decrease in infectious diseases over the 
past several decades with the widespread 
use of antibiotics, there has been a concur- 
rent increase in allergy and autoimmune 
diseases (6) presumably at least partially 
due to disruption of the gut microbiota. 
Notably, some of the diseases being 
treated by gut microbiota modu- 
lation have a profound dysbiosis 
(such as CDI), whereas others have 
a more subtle disruption of gut mi- 
crobes, which has implications for 
choosing the appropriate strategy 
for gut microbiota modulation. 

Numerous other factors should 
be taken into account when con- 
templating modulation of the gut 
microbiota. These include the 
means of gut microbiota modula- 
tion, preparative regimen, mea- 
surement of engraftment of gut 


ss microbes and of the desired physi- 
f= ologic effect, and concurrent di- 
etary intake (7). In general, the 

Microbial approach aims to restore a more 
consortia, “healthy” gut microbial commu- 
probiotics nity—although the definition of 


a “healthy” gut microbiota is not 

clearly established. However, data 
suggest that a diverse microbial 
community with a high degree of 
functional redundancy is associ- 

ated with better overall health (2) and bet- 
ter outcomes in several disease states (8, 9). 
The most successful application of FMT 
thus far is in the treatment of refractory CDI, 
where treatment with FMT has been shown 
to be generally safe and highly effective (2). 
Nonetheless, guidelines for proper treatment 
and screening of donor stool are critical for 
safety and include screening for infectious 
diseases and disorders that are associated 
with perturbations of the gut microbiota, as 
well as the use of medications that can affect 
gut microbes such as antibiotics and proton 
pump inhibitors (J0). Notably, these guide- 
lines are iterative, as new recommendations 
are made to expand screening and testing of 
donors based on insights gained from ongo- 
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ing trials. For example, screening of donors 
for multidrug-resistant organisms and se- 
vere acute respiratory syndrome coronavirus 
2 (SARS-CoV-2) is now recommended. This 
follows reports of several patients with CDI 
who developed systemic infection with anti- 
biotic-resistant bacterial infections following 
FMT (1D), as well as concerns about possible 
infections with SARS-CoV-2. 

The use of FMT is being investigated 
across numerous other disease conditions, 
although most of these are associated with a 
less profound dysbiosis and greater hetero- 
geneity in assessed endpoints and outcomes. 
However, there is clear evidence of success 
in some trials across a number of indica- 
tions, including IBD, after hematopoietic 
stem cell transplant and in autism spectrum 
disorders (5). Limitations in measuring ef- 
ficacy in FMT trials may arise from “true 
negative” results, or from numerous other 
confounding factors, including features not 
dependent on gut microbes that contrib- 
ute to the development and persistence of 
disease in the recipient, as well as variabil- 
ity in trial design and outcome measures. 
Additionally, there may be factors inherent 
to the FMT donor that may affect efficacy 
(such as composition and functional aspects 
of the transplanted microbiota); however, 
such “donor effects” may be less prominent 
for indications in which a more profound 
dysbiosis is present, such as in CDI and 
even IBD (12). Optimal dosing and route 
of delivery for FMT are also incompletely 
understood and may be context dependent. 
Additional studies are critically needed to 
interrogate the success (or failure) of this 
approach for these indications and to de- 
velop optimal strategies for use of FMT. 

One attribute of FMT not possessed by 
other strategies to modulate the gut micro- 
biota is the diversity of microbes that may 
be administered (including not only bacte- 
ria, but also viruses, fungi, and archaea) (see 
the figure). This provides potential func- 
tional redundancy for favorable impact on 
host physiology. In a profoundly dysbiotic 
state, this diversity represents a potential 
advantage over strategies that administer 
minimal-complexity microbial consortia, 
which may not engraft and may not be suf- 
ficient in reestablishing a “favorable” gut 
microbiota. However, the same attribute of 
increased diversity and complexity of FMT 
is also a limitation that creates issues with 
reproducibility and scalability. 

There are also concerted efforts under 
way to develop consortia of microbes that 
can be reliably and consistently manufac- 
tured and administered to favorably modu- 
late the gut microbiota to address gastro- 
intestinal and systemic disease, offering 
improved scalability over FMT. This includes 


SCIENCE sciencemag.org 


commercially available probiotics, which 
are live microorganism preparations with 
presumed health benefits. The impact of ad- 
ministration of many of these formulations 
across disease indications has been studied 
in clinical trials with mixed results, and to 
date none of these commercially available 
formulations are approved for use by major 
regulatory bodies such as the U.S. Food and 
Drug Administration (13). However, next- 
generation live biotherapeutics (live micro- 
organisms developed as therapeutic agents 
with defined clinical benefit claims) are now 
being developed based on insights gained 
from sequencing data in human cohorts and 
from studies in preclinical models (13)—with 
many now in clinical trials. 

The first wave of these next-generation 
live biotherapeutics focused mainly on 
taxonomy—incorporating single or several 
bacterial taxa within a consortia based on 
insights gained from profiling gut micro- 
bial species in human cohort studies and 
in preclinical models. An example of this 
is in cancer immunotherapy: Clinical trials 
are now under way using modulation of the 
gut microbiota through administration of 
microbial consortia (7). These formulations 
range from simple (monoclonal microbial 
formulations) to complex (involving consor- 
tia of 50 or more bacterial taxa and strains). 
However, there is a growing appreciation 
that focusing on the functional aspects of 
these microbes may be far more important 
than simply focusing on taxonomy, and ge- 
netically modified organisms are now being 
developed with a wide range of functional 
attributes (13). Although overall these for- 
mulations are generally well-tolerated, 
safety still needs to be taken into account 
because there are reports of bacterial trans- 
location of these organisms from the gut 
into the bloodstream in critically ill pa- 
tients receiving gut microbiota modulation 
through administration of commercially 
available probiotics (J4). 

Another strong consideration in gut mi- 
crobiota modulation is the role of diet and 
prebiotics, as these can profoundly influ- 
ence existing commensal gut microbes and 
those administered for therapeutic intent. 
These may ultimately serve as a stand-alone 
intervention in appropriate individuals with 
more subtle gut dysbiosis. Short-term stud- 
ies have shown that large changes in diet 
can have a marked impact on gut microbes 
and associated physiology in the short term 
(15). However, this reliably reverts to a pre- 
intervention state if the instituted change 
in diet is not sustained. Nonetheless, nu- 
merous dietary intervention studies are 
currently under way (7), ranging from a 
somewhat simple intervention of adding 
one cup of canned beans per day to existing 
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diets (NCT02843425) to extended (or longer- 
term) dietary interventions, where meals are 
prepared for (and shipped to) participants 
(NCT03950635). Such dietary modifications 
have potential relevance even if recipients 
are also treated with other gut microbiota 
modulation strategies such as FMT or live 
biotherapeutics, as they may sustain and pro- 
mote optimal function of the transferred gut 
microbes, although optimal approaches of 
dietary intervention in these scenarios has 
yet to be defined. The use of prebiotic supple- 
mentation (such as resistant starches, poly- 
phenols, and polyunsaturated fatty acids) is 
also being studied, because these compounds 
may provide optimal substrate to beneficial 
commensal (or administered) microbes. 

It is becoming evident that modulation 
of gut microbes will be increasingly em- 
ployed to promote overall health and to 
help treat disease, although optimal strate- 
gies for “precision” gut microbiota modula- 
tion remain incompletely understood. It is 
probable that a personalized approach will 
be needed, incorporating strategies such as 
FMT, administration of live biotherapeutics, 
dietary strategies, and prebiotics—although 
it is not inconceivable that an ideal “one- 
size-fits-all” approach could be identified. 
Through additional research and collabora- 
tive efforts, the true definition of dysbiosis 
in the gut microbiota as it relates to disease 
states can be better understood, as well as 
what constitutes an optimal gut microbiota 
to promote overall health, which could have 
broad impact for public health. 
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ULTRACOLD PHYSICS 


Laser cooling of larger quantum objects 


A nonlinear polyatomic molecule, CaOCH,, has been laser-cooled to below 1 millikelvin 


By Eric R. Hudson 


urfers have a concept they call progres- 

sion. It is roughly the idea that each 

successive generation of wave riders 

is not constrained by the same idea of 

what is “impossible.” Progression often 

comes in small steps, usually helped 

by improvements in technology but every 

so often—like Laird Hamilton’s Millennium 

Wave at Teahupoo, Tahiti (7)—it comes in a 

giant leap when somebody does what every- 

one else was too scared to try. On page 1366 

of this issue, Mitra et al. (2) have progressed 

molecular physics in a step that was unthink- 

able only a few years ago by laser-cooling a 
nonlinear polyatomic molecule, CaOCH,,. 

Like stopping a bowling ball by repeatedly 

hitting it with ping-pong balls (3), laser cool- 


ing utilizes the repeated scattering of photons 
from a particle to cool it to ultracold tempera- 
tures (<1 mK). The technique, which garnered 
the 1997 Nobel Prize, has been the workhorse 
of atomic physics for roughly three decades 
and underlies virtually all experiments in 
the field, in areas as diverse as quantum 
computing and timekeeping. Extending the 
technique to more complicated objects such 
as diatomic and polyatomic molecules holds 
promise for new routes to important quan- 
tum science and technology (4). For atoms, 
their simple electronic structure enables 
their quick relaxation, through spontane- 
ous emission, to only one of a few low-lying 
electronic states from which they can be re- 
excited. However, the ro-vibrational degrees 
of freedom of molecules lead to an increase 
in the number of low-energy states accessible 


Choosing molecules for laser cooling 


A molecule can be laser-cooled if it absorbs and emits photons upon an electronic transition without changing its 
vibrational state. Mitra et al. have extended laser cooling from linear molecules to CaOCH;, a nonlinear polyatomic molecule. 


Criteria for coolness 


The Franck-Condon principle roughly dictates that this is possible for molecules that have similar 


bond lengths in the ground and excited states. 


Interaction potential 


Atom separation 


Difficult to cool 

In CaO, the excited-state potential is shifted 
relative to the ground state. As a result, decays 
from the excited state populate many vibrational 
levels, making cooling difficult. 


Cooling larger molecules 


Recent work has suggested that the ligand on a metal atom need not be a simple atom but can also be a functional group, 
which has allowed cooling of larger nonlinear symmetric top polyatomics such as CaOCH.,. The size and symmetry limits of 


this design principle are the subject of current research. 
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Interaction potential 


Atom separation 


Easier to cool 


In CaF, the range and shape of the excited-state potential 
are similar to those of the ground-state potential. As a 
result, decays from the excited state typically do not change 


the vibrational state. 


Large organic molecule 
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by spontaneous emission. In general, scatter- 
ing many photons from molecules requires 
an unwieldy number of lasers to address all 
of the possible low-lying ro-vibrational states. 

The beginning of a solution to this prob- 
lem came in two steps. In 2004, Di Rosa (5) 
pointed out that certain diatomic molecules, 
whose atoms were bound by roughly the 
same force in their ground and excited elec- 
tronic states, were unlikely to change their 
vibrational state when spontaneously emit- 
ting a photon (see the figure, top right). In 
2008, Stuhl et al. (6) proposed a method, 
based on a so-called “type II” magneto- 
optical trap demonstrated in atoms (7) and 
only recently understood (8), to handle the 
pesky rotational degree of freedom. Less than 
a year later, Shuman et al. (9) provided the 
first experimental demonstration of these 
ideas on the molecule SrF. Since 
then, three-dimensional laser 
cooling and trapping have been 
demonstrated for a number of 
diatomics including CaF (JO, 
11), and laser cooling along one 
dimension has been observed in 
linear triatomic molecules such 
as SrOH (12). 

The choice of these molecules 
can be understood by consider- 
ing their gross electronic struc- 
ture (73). For example, in CaF 
and SrF, the ns? configuration of 
the alkaline earth atom donates 
one electron to the halogen. 
The remaining valence elec- 
tron remains localized on the 
metal such that the molecular 
electronic structure resembles 
that of the alkaline earth, mak- 
ing a nearly ideal molecule for 
laser cooling. If, however, O is 
used instead of F, the resulting 
electronic structure of CaO is 
markedly different from that of 
the parent metal atom and the 
molecule is a poor choice (see 
the figure, top left). Interestingly, 
the OH group in the triatomics 
behaves essentially like a halo- 
gen atom, accepting one electron 
from the ns? configuration of the 
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metal and yielding an atomic-like molecular 
electronic structure. 

Kozyryev et al. continued this trend of 
cooling larger, more complex molecules by 
showing that the methoxy group should also 
behave similarly to the halogen, and pro- 
posed CaOCH,, as a candidate for laser cool- 
ing (see the figure, bottom) (/4). Although 
this choice may sound like a straightfor- 
ward expansion, the extension to a non- 
linear polyatomic molecule comes fraught 
with pitfalls. The spectroscopy of such mol- 
ecules is much more complicated than the 
diatomics and triatomics studied previously. 
Effects normally neglected in smaller mol- 
ecules can lead to a breakdown of the Born- 
Oppenheimer approximation and open new 
loss channels, such as coupling between the 
12 vibrational modes of the molecule. Thus, 
like the Millennium Wave, the experiment 
had every chance of failure, but Mitra et al. 
demonstrated laser cooling of CaOCH, along 
one dimension of a beam down to a temper- 
ature of ~700 wK. They also demonstrated 
separate deterministic cooling of the two 
nuclear spin isomers. 

The immediate impact of this result is 
that we now know it is possible to laser- 
cool molecules with non-C,,, symmetry. 
This result opens the door to full three- 
dimensional cooling and trapping of a new 
class of quantum objects that possess pre- 
viously inaccessible properties such as chi- 
rality. If the history of laser cooling is any 
guide, this capability should enable major 
advances in quantum computing and sens- 
ing, timekeeping, chemistry, and precision 
tests of fundamental physics (15). 
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NANOTECHNOLOGY 


Reactive polymers guide 
nanoparticle clustering 


Tailored polymer shells drive the assembly of a desired 
class of nanoparticle architectures 


By Oleg Gang*?* 
rom the early days of nanoparticle 
synthesis, researchers understood 


the need for adaptable methods that 
coordinated particles in clusters 
with precise nanoarchitecture. The 
similarity between the building of 
nanoscale clusters from particles and the 
formation of molecules from atoms goes 
beyond a structural analogy. As with mol- 
ecules, which demonstrate properties not 
found in atoms, clusters exhibit synergistic 
functional responses beyond the proper- 
ties of individual nanoparticles (J-3). For 


approach can be applied at the nanoscale, 
nanoparticles typically exhibit more com- 
plex interactions caused by the ligand 
shell grafted to their surface and the con- 
tribution of electrostatic, van der Waals, 
and other forces. For spherical nanoparti- 
cles, the ligand shell might modify the clus- 
ter structure expected for hard spheres, but 
the isotropic nature of interactions typically 
is preserved. Thus, a similar, albeit modi- 
fied, “packing” concept still applies, how- 
ever, the realization is often challenging. 
For nanoparticles with multiligand shells 
and complex interparticle interactions, in- 
tricate organizations might occur, but con- 


Polymers guide cluster design 


Nanoparticle A is grafted with charged copolymers (red lines) bearing a block of acidic groups; 
Bis grafted with neutral copolymers (blue lines) bearing a block of basic groups. 


? COO 


Charged 
copolymer 


Neutral 
copolymer 


Interplay 
Aneutralization reaction drives attractive interactions, 
whereas the steric groups induce repulsion. 


atomic systems, the formation of mole- 
cules is rationalized by orbital hybridiza- 
tion principles. Because this concept is not 
suitable for nanoparticles, scientists have 
explored other particle-assembly methods. 
On page 1369 of this issue, Yi et al. (4) de- 
scribe a high-yield method for the assem- 
bly of targeted nanoparticle clusters. 
Although a variety of methods exist for 
building nanoparticle clusters, they all fol- 
low a similar set of main concepts. In the 
first approach (called “packing”), possible 
mutual arrangements of attractive micron- 
sized particles determine the formation 
of specific clusters (5). Although a similar 
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Cluster formation 
The acid:base ratio per particle governs the cluster's 
architecture and defines the interparticle angle. 


trol over the resulting structures is limited. 

The central idea of this anisotropic bind- 
ing concept is to mimic atomic valence by 
establishing specifically located affinity 
patches on particles, forming so-called 
“patchy particles.” This idea has been in- 
tensively investigated theoretically (6), and 
several implementations have been dem- 
onstrated for micron-sized particles (7, 8). 

Because this fabrication approach is 
limited to larger particles, a different strat- 
egy—molecular patterning—was used to 
create patchy nanoscale particles (9, 10). 
This third concept provides the highest 
degree of control over cluster formation 
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by using an underlying molecular (2, 3, 1) 
or nano-object (12) scaffold that prescribes 
the position of each nanoparticle and co- 
ordinates them in a targeted cluster. Thus, 
the spectrum of cluster-assembly methods 
ranges from a packing-based strategy with 
minimal particle engineering, to more 
elaborate patchy-particle strategies with 
designed particle-based directional inter- 
actions, to fully prescribed scaffold-based 
clusters. Among these diverse methods, 
none mimics a special aspect of inter- 
atomic bonds: Valence electrons, although 
delocalized within each atom, still sup- 
port formation of well-defined molecules 
through directional bonds. 

The new nanoparticle assembly strategy 
of Yi et al. resembles the “delocalized” fea- 
ture of atomic systems. It 
relies on polymer-mediated 
reactions and interactions 
to regulate binding between 
two types of gold spheri- 
cal nanoparticles (A and 
B) (see the figure). To form 
polymeric shells, the au- 
thors grafted two kinds of 
block copolymers contain- 
ing either acid (a) or base 
(b) groups to the surfaces 
of these particles. The co- 
polymer design allowed for 
control over the number and arrangement 
of reactive groups and inclusion of a chain 
portion on the outer part of the shells, so 
as to tailor steric repulsion between the 
shells or enable hydrophobic or hydrophilic 
interactions. 

A neutralization reaction between the 
acid groups of the charged copolymers on 
one particle and base groups of neutral 
copolymers on the other particle drove 
the attractive interparticle interactions. 
Controlling the number of acid and base 
groups within grafted chains and the num- 
ber of chains per particle allowed regula- 
tion of the ratio Cn) between acid and 
base groups. In this approach, Zap repre- 
sents an effective valence for AB binding 
(see the figure). A neutralization reaction 
should progress until all the acid and base 
groups are reacted. This might require a 
specific reaction stoichiometry between 
particles A and B, which should be satis- 
fied in the formation of an AB. cluster. Yi 
et al. experimentally observed a direct cor- 
respondence between Z, fb and 2. 

The polymeric properties of the interact- 
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"this new 
methodology 
offers the ability to 
assemble with 
ease a desired class 
of nanoparticle 
architectures.” 


ing shells proved critical for realization of 
the “delocalized” concept. The polymeric 
chains formed a weakly localized “cloud” 
of acid and base groups within the shell of 
each particle, which resembled delocalized 
electrons in atomic orbitals. After neutral- 
ization, polymer chains adopted confirma- 
tions that permitted acid and base groups 
to reorganize and react to the fullest extent 
possible. Thus, the number of B particles 
that reacted with A particles was fully de- 
termined by the Zapp Moreover, given the 
steric repulsion between the particles 
and the Coulombic repulsion between the 
charged polymers in the bonds, the result- 
ing nanoparticle clusters accommodated a 
spatial configuration that possessed equal 
angles between the B particles, thus leading 
to the demonstrated direc- 
tional binding. The system 
also exhibited a self-limiting 
mechanism for cluster as- 
sembly, because the result- 
ing reorganization of the 
grafted chains depleted the 
base groups in the outer 
hemisphere of B_ particles 
(see the figure). Electrostatic 
repulsion by the charged 
copolymers further reduced 
clustering beyond the desig- 
nated stoichiometry. 

Given the large variety of copolymer 
structures, this new methodology offers 
the ability to assemble with ease a de- 
sired class of nanoparticle architectures. 
Further, by making use of copolymer mo- 
tifs with hydrophobic and _ hydrophilic 
shells, the clusters can be hierarchically as- 
sembled into more complex organizations, 
possibly permitting the creation of highly 
engineered nanoparticle-based materials. 
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CORONAVIRUS 


Coronavirus 
dons a 
new crown 


A transmembrane pore 
reveals parallels 
with other viruses 


By Nuruddin Unchwaniwala!? 
and Paul Ahliquist*?? 


evere acute respiratory syndrome 

coronavirus 2 (SARS-CoV-2), which 

causes coronavirus disease 2019 

(COVID-19), belongs to the positive- 

strand RNA [(+)RNA] viruses, a 

large class of viruses that includes 
Zika, hepatitis C, and chikungunya viruses. 
(+)RNA viruses package their genomes 
in infectious virions as messenger-sense 
RNA and reproduce these genomes solely 
through RNA intermediates in replication 
complexes (RCs) formed by rearranging in- 
tracellular membranes (7). RNA replication 
is a major target of antiviral drugs, includ- 
ing remdesivir, which shows promise for 
treating COVID-19 patients. RCs of corona- 
viruses and some other (+)RNA viruses are 
~250- to 300-nm-diameter double-mem- 
brane vesicles (DMVs) that contain viral 
double-stranded RNA (dsRNA) replication 
intermediates (2-4). On page 1395 of this 
issue, Wolff et al. (5) identify a crown-like 
double-membrane-spanning molecular pore 
on SARS-CoV-2 and other coronavirus 
DMvVs that likely solves the longstanding 
problem of how progeny (+)RNA genomes 
are released from DMVs. 

Like other (+)RNA viruses, most (~70%) 
of the SARS-CoV-2 genome encodes func- 
tions for RNA replication, underscoring 
the importance of this process for under- 
standing and controlling these viruses. RCs 
support genome replication by organizing 
viral RNA replication proteins, viral RNA 
templates, specific host factors required for 
RNA replication, and successive reproduc- 
tive steps. The RC-bounding membranes 
sequester RNA replication templates and 
intermediates from translation, virion as- 
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sembly, RNA decay, and host defenses such 
as RNA interference and interferon-stimu- 
lated antiviral responses. 

Although infection by coronaviruses in- 
duces several types of membrane rearrange- 
ments, multiple lines of evidence identified 
the dsRNA-containing DMVs as the viral 
RNA synthesis sites (2, 6). However, be- 
cause DMVs lacked known openings, it was 
unclear how new (+)RNA genomes copied 
from dsRNA templates in the DMV interior 
could transit to the cytoplasm to be trans- 
lated, packaged into virions, and potentially 
form new RCs. 

Wolff et al. provide a compelling solution 
to this conundrum by using advanced cryo- 
electron tomography (cryo-ET) to identify a 
cylindrical protein pore complex traversing 
DMV double membranes in cells infected 
by SARS-CoV-2 or another coronavirus (see 
the figure). They also showed that this pore 
contains six copies of the large viral trans- 
membrane protein nsp3 (nonstructural 
protein 3), which is essential for RNA repli- 
cation and induces the formation of DMVs 
with viral nsp4. Consistent with the interac- 
tion of nsp3 with multiple viral replication 
proteins, the authors imaged frequent, ap- 
parently dynamic interaction of the pore’s 
DMV luminal and cytoplasmic sides with 
other macromolecules. Thus, the pore may 
interact with the viral RNA polymerase and 
other luminal RNA replication factors to 
guide newly synthesized RNAs to the cyto- 
plasm, where the interaction of nsp3 with 
the viral nucleocapsid protein may facilitate 
RNA packaging into new virions. 

The DMV pore is also an attractive solu- 
tion for coronaviral RNA release because 
of similarities with a product RNA-release 
channel first identified in nodaviruses, a 
well-characterized model for (+)RNA virus 
replication. In addition to DMVs, another 
prominent class of RCs is necked spherular 
membrane invaginations (spherules) that 
are formed by numerous families of (+)RNA 
viruses including flaviviruses, such as Zika 
virus (7, 8), alphaviruses, such as chikungu- 
nya virus (9), and nodaviruses (10). Cryo-ET 
showed that the ~50- to 80-nm-diameter 
nodavirus spherules (see the figure) contain 
viral dsRNA replication templates and that 
the cytoplasmic side of the spherule neck is 
surmounted by a ring, or crown, of 12 copies 
of nodavirus RNA replication protein A (11, 
12). These crowns are frequently origins for 
cytoplasmic filaments that appear to repre- 
sent new (+)RNA genomes being released. 
Crown-forming protein A contains all vi- 
ral activities for RNA synthesis and shares 
distant sequence similarities to alphavirus 
RNA replication proteins (17-13). 

Despite some differences in the mem- 
brane organization of DMV and spherule 
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Replication complex crowns 

RNA replication complexes (RCs) are distinct types 
of membrane compartments containing double- 
stranded RNA (dsRNA). A crown-like pore complex on 
coronavirus RCs (5) parallels a similar crown of viral 
proteins on nodavirus RCs (12), providing a channel 
to release RNA progeny to the cytoplasm. 


Coronavirus 


Replication 
complex 


RCs, the DMV pores of coronaviruses show 
multiple parallels with nodavirus spherule 
crowns. These include their ringed mul- 
timeric structure, their apparent role as 
channels to release progeny RNA replica- 
tion products, and potential involvement 
or interaction with active RNA synthesis. 
Consistent with these similarities, Wolff et 
al. also refer to the cytosolic portion of the 
coronaviral pore as a crown. However, it 
is important to note that this intracellular 
DMV RC crown is unrelated to the crown- 
like halo of virion envelope spike proteins 
that gave coronaviruses their name (14). 
The recognition that RCs of coronavi- 
ruses and nodaviruses share fundamentally 
similar crown-like channels is notable given 
the considerable evolutionary separation of 
these viruses. Additional studies will fur- 
ther define the similarities and differences 
between these crowns, with functional 
and potentially evolutionary implications. 
Although the cytosolic portion of the coro- 
navirus crown has sixfold symmetry, some 
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membrane-interacting regions of the pore 
show rings of 12 similar electron-dense ele- 
ments. Because the volume of the coronavi- 
ral pore complex shows that it must contain 
additional proteins beyond nsp3, the stoi- 
chiometry and symmetry of these regions 
remains uncertain. Even if only sixfold sym- 
metric, do these 12-member rings bear any 
structural similarities to the 12-fold sym- 
metric nodavirus crown? 

Detailed interactions of these crowns with 
membrane lipids will also be of interest and 
might be more similar than seeming differ- 
ences in DMV and spherule architectures 
suggest. The nodavirus spherule membrane 
folds at the neck into two sections that in- 
dependently approach the crown, mirror- 
ing the connection of double membranes 
at nuclear pores. Similarly, hydrophobic 
surfaces on the coronaviral pore might in- 
duce lipids in the two DMV membranes to 
converge or interact, again approximating 
nuclear pores. Higher-resolution imaging of 
the nodavirus crown (12) bodes well for ad- 
dressing such questions. 

Additional questions include what other 
coronaviral and perhaps host proteins com- 
prise the crown-like DMV channel. Viral 
nsp4 is one attractive candidate because it 
functions with nsp3 to form DMVs. Equally 
enticing is the identity of the dynamic RNA 
and protein interactions at both ends of 
the DMV channel and how these may pro- 
mote RNA synthesis, transport, and virion 
assembly. Perhaps most important, recog- 
nizing conserved processes such as crown- 
mediated release of new genomic RNAs 
should provide a foundation for potentially 
broader-spectrum control, through phar- 
macologic or genetic means, of ubiquitous 
(+)RNA virus pathogens. & 
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RETROSPECTIVE 


Flossie Wong-Staal (1946-2020) 


Trailblazing HIV researcher 


By Genoveffa Franchini 


lossie Wong-Staal, a leader in HIV 

research at the onset of the AIDS epi- 

demic, died on 8 July at the age of 73. 

A pioneer in the genetic structure and 

regulatory mechanisms of HIV, Wong- 

Staal played a key role in showing that 
HIV causes AIDS. She was also a trailblazer 
for female scientists. 

Born in Guangzhou, China, on 27 August 
1946 as Yee Ching Wong, Wong-Staal moved 
to Hong Kong with her family when she was 
7 years old. At the age of 18, she western- 
ized her name (taking the name “Flossie” 
from a typhoon that had recently hit south- 
ern China) and emigrated to the United 
States. The first woman in her family to 
obtain a higher education, she graduated 
with a bachelor’s degree in bacteriology in 
1968 and a Ph.D. in molecular biology in 
1972, both from the University of California, 
Los Angeles. In 1973, Wong-Staal began a 
postdoctoral position at the Laboratory of 
Tumor Cell Biology, led by biomedical re- 
searcher Robert C. Gallo, in the National 
Cancer Institute of the National Institutes 
of Health (NIH) in Bethesda, Maryland. 
She was promoted to section chief within 
a few years, and in the next decade she co- 
authored more than 100 journal articles. 

In 1990, Wong-Staal left the NIH to ac- 
cept an appointment as the Florence Seeley 
Riford Chair in AIDS Research at the 
University of California, San Diego (UCSD). 
She was named director of the newly created 
UCSD Center for AIDS Research in 1994 and 
began pioneering the investigation of gene 
therapy approaches for HIV/AIDS. After her 
retirement from UCSD in 2002, she became 
vice president of Immusol, a biopharmaceu- 
tical company she cofounded, now known 
as iTherX Pharmaceuticals, where she pur- 
sued treatments for hepatitis C. Throughout 
her career, Wong-Staal trained a large num- 
ber of postdoctoral fellows, many of whom 
went on to be leaders in their fields. 

When Gallo’s team cultured the first hu- 
man retrovirus in the late seventies, Wong- 
Staal was focused on retroviruses that 
caused leukemia in animals. She quickly 
pivoted to follow up on Gallo’s work, and her 
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section became the leading group working 
on the molecular biology of human retrovi- 
ruses. In these early years of research on hu- 
man retroviruses, the skepticism regarding 
their existence was widespread. Gallo later 
framed a letter from a reviewer who, citing 
the “fact” that there were no human retro- 
viruses, rejected the initial human T cell 
leukemia virus (HTLV) paper. But Wong- 
Staal was among those who thought there 
was a connection between retroviruses and 
human diseases, and she was proved right. 
The molecular virology skills Wong-Staal 
brought to her laboratory were critical to 
her ability to apply state-of-the-art molecu- 
lar biology techniques to quickly unravel 


the HIV genome organization and its repli- 
cation strategies. In the early 1980s, Wong- 
Staal discovered molecular evidence of vari- 
ations in HIV within and among infected 
individuals, which led to a fundamental 
realization: HIV is constantly mutating in 
response to immune pressures, so every iso- 
lation of the virus results in different virus 
clones. This understanding shaped the de- 
velopment of effective antiviral therapies to 
manage AIDS. Wong-Staal also provided the 
molecular biology necessary for the devel- 
opment of the second-generation blood test 
for HIV, one based on detection of the viral 
genome rather than antibodies to the virus. 
Her groundbreaking work on the molecular 
biology of HIV inspired scientists world- 
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wide to join the field of human retrovirol- 
ogy, an entirely uncharted but increasingly 
exciting area of research in the 1980s and 
early 1990s. 

Wong-Staal’s contributions were not lim- 
ited to HIV. She had a keen interest in the 
molecular virology of HTLV-1, the retrovi- 
ral causative agent of human adult T cell 
leukemia. Her work on nonstructural viral 
factors such as the HTLV-1 and HIV-1 tran- 
scriptional activators Tax and Tat and the 
posttranslational regulators Rex and Rev 
also had far-reaching implications in areas 
of basic biology, including transcription 
regulation and RNA transport. 

I joined Wong-Staal’s lab in 1979 as a 
postdoctoral fellow. She was a very talented 
scientist who had an exceptional ability 
to sharply analyze data, focus, and move 
quickly to address the most essential re- 
search questions. She was willing to take 
risks and propose daring hypotheses, al- 
ways expanding her knowledge and moving 
research forward. This mindset extended 
beyond her lab work. Decades ago, for a pre- 
sentation at an AIDS meeting, she told the 
audience that instead of using the standard 
slides, she was going to give her presenta- 
tion using a program called PowerPoint that 
she had learned about from her daughter. 
By the next meeting, there was not a slide 
presentation in sight; everyone was using 
PowerPoint. 

A stylish, elegant, and confident woman 
with a great sense of humor, Wong-Staal 
was competitive and tenacious. When she 
submitted her first grant application from 
UCSD after leaving the NIH, the reviewers 
did not give her a fundable score because 
they thought that, as a molecular biologist, 
she did not have the immunology experi- 
ence required to carry out the proposed 
studies. In response, she conducted the 
study anyway, published the data, and sent 
the publication with her next grant appli- 
cation. In this way, Wong-Staal taught me 
persistence and resilience, skills that I have 
found to be invaluable in my career. 

Flossie Wong-Staal held her own and 
gained respect in a male-dominated sci- 
entific world through her strength, intelli- 
gence, kindness, and grace. She was a mem- 
ber of the National Academy of Medicine. 
In 2002, Discover magazine named her one 
of the 50 most important women in science. 
In 2019, she was inducted into the National 
Women’s Hall of Fame, along with Angela 
Davis, Jane Fonda, and Sonia Sotomayor. It 
was a well-deserved honor for an influential 
researcher, who served as a role model to 
mentees and colleagues and will continue 
to inspire future generations of scientists. 
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ETHICS: COVID-19 


An ethical framework for 
global vaccine allocation 


The Fair Priority Model offers 
pledges to distribute vaccines 


By Ezekiel J. Emanuel, Govind Persad’, 

Adam Kern’, Allen Buchanan’, Cécile Fabre’, 
Daniel Halliday®, Joseph Heath’, Lisa Herzog®, 
R. J. Leland®, Ephrem T. Lemango”, Florencia 
Luna", Matthew S. McCoy’, Ole F. Norheim”, 
Trygve Ottersen® , G. Owen Schaefer", 
Kok-Chor Tan®, Christopher Heath Wellman’, 
Jonathan Wolff”, Henry S. Richardson®® 


nce effective coronavirus disease 2019 

(COVID-19) vaccines are developed, 

they will be scarce. This presents the 

question of how to distribute them 

fairly across countries. Vaccine al- 

location among countries raises 
complex and controversial issues involving 
public opinion, diplomacy, economics, public 
health, and other considerations. Neverthe- 
less, many national leaders, international or- 
ganizations, and vaccine producers recognize 
that one central factor in this decision-mak- 
ing is ethics (J, 2). Yet little progress has been 
made toward delineating what constitutes 
fair international distribution of vaccine. 
Many have endorsed “equitable distribution 
of COVID-19...vaccine” without describing a 
framework or recommendations (3, 4). Two 
substantive proposals for the international 
allocation of a COVID-19 vaccine have been 
advanced, but are seriously flawed. We of- 
fer a more ethically defensible and practical 
proposal for the fair distribution of COVID-19 
vaccine: the Fair Priority Model. 

The Fair Priority Model is primarily ad- 
dressed to three groups. One is the COVAX 
facility—led by Gavi, the World Health Or- 
ganization (WHO), and the Coalition for 
Epidemic Preparedness Innovations (CEPI)— 
which intends to purchase vaccines for fair 
distribution across countries (5). A second 
group is vaccine producers. Thankfully, many 


a practical way to fulfill 
fairly and equitably 


producers have publicly committed to a 
“broad and equitable” international distribu- 
tion of vaccine (2). The last group is national 
governments, some of whom have also pub- 
licly committed to a fair distribution (J). 

These groups need a clear framework for 
reconciling competing values, one that they 
and others will rightly accept as ethical and 
not just as an assertion of power. The Fair 
Priority Model specifies what a fair distri- 
bution of vaccines entails, giving content to 
their commitments. Moreover, acceptance of 
this common ethical framework will reduce 
duplication and waste, easing efforts at a fair 
distribution. That, in turn, will enhance pro- 
ducers’ confidence that vaccines will be fairly 
allocated to benefit people, thereby motivat- 
ing an increase in vaccine supply for interna- 
tional distribution. 


VACCINE NATIONALISM 

Those who think countries will inevitably 
engage in “vaccine nationalism” (4) may 
deem an ethical framework for vaccine dis- 
tribution among countries irrelevant. Public 
sentiment in some countries for retaining 
vaccine developed within their borders is 
strong, and many governments will also try 
to obtain vaccines produced elsewhere. But 
an ethical framework has broad relevance 
even in the face of nationalist attitudes. 
Rather than simply asserting that might 
makes right, governments typically appeal 
to national partiality: a country’s right and 
duty to prioritize its own citizens. 

Some defend national partiality as ethical 
(6-8). Fellow citizens share “associative ties,” 
common governmental, civic, and other in- 
stitutions, and a sense of shared identity (6, 
7). Also, the legitimate authority of represen- 
tative government officials inheres in their 


representing and promoting the interests of 
their citizens. Plausibly, these relations sup- 
port allowing countries to prioritize citizens 
over foreigners for vaccines (6). Others view 
national partiality as unethical: People’s en- 
titlement to lifesaving resources should not 
depend on nationality (9). 

Regardless of whether some national 
partiality is ethical, unlimited national 
partiality is not (6-8). Associative ties only 
justify a government’s giving some prior- 
ity to its own citizens, not absolute priority 
(6). Moreover, associative ties extend across 
national borders, and citizens of different 
countries share common institutions (7). 
Finally, national governments have cross- 
border responsibilities to help satisfy fun- 
damental needs like basic health care, par- 
ticularly in a global health emergency (7). 

Reasonable defenders of national partial- 
ity will differ on how much priority coun- 
tries should give their citizens for vaccines. 
To establish the need for an equitable inter- 
national distribution, it is unnecessary to 
determine an optimal level of priority. It is 
sufficient to identify a clear upper bound: 
Reasonable national partiality does not per- 
mit retaining more vaccine than the amount 
needed to keep the rate of transmission (Rt) 
below 1, when that vaccine could instead mit- 
igate substantial COVID-19-related harms in 
other countries that have been unable to keep 
Rt below 1 through ongoing public-health 
efforts. The marginal benefit of additional 
doses of vaccine in a country able to keep Rt 
below 1 generally will pale in comparison to 
the potential benefits to countries whose Rt 
remains above 1—at least until booster vac- 
cination is needed to maintain immunity. 
Hence, with Rt below 1, there will not be suf- 
ficient vaccine-preventable harm to justify re- 
taining vaccine. When a government reaches 
the limit of national partiality, it should re- 
lease vaccines for other countries. This makes 
an account of fair allocation among countries 
relevant to reasonable national governments. 


THREE FUNDAMENTAL VALUES 

Fairly distributing a COVID-19 vaccine 
among countries is a problem of distribu- 
tive justice. Although governments will be 
the initial recipients of vaccine, fair dis- 
tribution across countries must reflect a 
moral concern for the ultimate recipients: 
individuals. Three values are particularly 
relevant: benefiting people and limiting 
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harm, prioritizing the disadvantaged, and 
equal moral concern. 

Benefiting people and limiting harm is 
widely recognized as important across ethical 
theories. Realizing this value requires defin- 
ing relevant benefits, measuring them, and as- 
sessing the relative urgency—the importance 
and time sensitivity—of countries’ needs. A 
successful vaccine produces direct benefits by 
protecting people against death and morbid- 
ity caused by infection. It also produces indi- 
rect benefits by reducing death and morbidity 
arising from health systems overstressed by 
the pandemic, and by reducing poverty and 
social hardship such as closed schools. 


on three types of harms directly or indirectly 
caused by COVID-19. First, COVID-19 kills 
people and causes permanent organ dam- 
age. Second, the pandemic indirectly harms 
health even for the uninfected by straining 
health care systems, raising mortality rates 
for common conditions, causing stress that 
harms mental health, and accelerating the 
spread of disease by hindering immuniza- 
tions. Third, the pandemic has devastated 
the global economy, causing unemployment, 
economic decline, poverty, and starvation. 
Economics and health interact: Worsening 
economic conditions harm health, and a 
worsening pandemic harms the economy. 


A family member prays at a relative’s grave in Comas, in the outskirts of Lima. Peru has one of the highest 
COVID-19 death tolls among countries in Latin America and the Caribbean region. 


Prioritizing the disadvantaged is a funda- 
mental value in ethics and global health (0, 
11). Realizing this value requires that vaccine 
distribution reflect special concern for people 
who are disadvantaged. Fairly distributing a 
COVID-19 vaccine internationally therefore 
requires assessing different types of disad- 
vantage. Are the worst-off countries those ex- 
periencing the greatest poverty? Those where 
people have the lowest life expectancies? 

Equal moral concern requires treating 
similar individuals similarly and not dis- 
criminating on the basis of morally irrelevant 
differences, such as sex, race, and religion. 
Distributing different quantities of vaccine to 
different countries is not discriminatory if it 
effectively benefits people while prioritizing 
the disadvantaged. 


THE FAIR PRIORITY MODEL 

To guide fair distribution of vaccine across 
countries, we propose the Fair Priority 
Model. Fair allocation must seek to mitigate 
future adverse effects of COVID-19. We focus 
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The pandemic forces allocators to decide 
where a vaccine’s harm-reducing powers are 
most urgently needed. Three dimensions of 
harm are important. Are the harms irrevers- 
ible? How devastating are they? And can they 
be compensated? 

On these three dimensions, prevent- 
ing death—especially premature death— 
is particularly urgent. Death is uniquely 
devastating, and those who die for want 
of vaccine cannot be compensated later 
on. Surveys further suggest popular agree- 
ment that a premature death that prevents 
someone’s exercising their skills or realiz- 
ing their goals later in life is worse than 
a death later in life (71, 12). Ethicists have 
similarly argued that preventing early 
deaths—deaths that are more prevalent 
in poorer countries—is both prudent and 
ethical (70, 13). 

Death, however, is not the only irrevers- 
ible and devastating harm. COVID-19 causes 
strokes and organ damage with long-term 
consequences. It also diminishes education 
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and causes unemployment and poverty that 
impose long-term devastation. 

The Fair Priority Model proceeds in three 
phases, preventing more urgent harms ear- 
lier (see the Table). Phase 1 aims at reduc- 
ing premature deaths and other irrevers- 
ible direct and indirect health impacts. 
Phase 2 continues to address enduring 
health harms but additionally aims at re- 
ducing serious economic and social depri- 
vations such as the closure of nonessential 
businesses and schools. Restoring these 
activities will lower unemployment, reduce 
poverty, and improve health. Finally, phase 
3 aims at reducing community transmis- 
sion, which in turn reduces spread among 
countries and permits the restoration of 
prepandemic freedoms and economic and 
social activities. 

Implementing each phase of the model 
requires determining the number of vaccine 
doses each country should receive and the 
order of receipt. The countries will then al- 
locate vaccine internally to individuals. We 
expect that they will initially focus on areas 
where premature mortality can be reduced. 
Determining how many vaccine doses are 
allocated to each country depends on the 
marginal improvement in ethically relevant 
metrics that each dose achieves. There are 
likely to be multiple distributions of vaccine 
as supply becomes available over time. 

Five factors guide the choice of metrics 
for each phase: (i) fidelity to the underlying 
ethical values; (ii) simplicity; (iii) previous 
use in global health and development; (iv) 
ease of obtaining rapid but reasonable esti- 
mates as the pandemic evolves; and (v) sen- 
sitivity to relevant harms that are difficult 
to measure directly. 

In phase 1, we propose using Standard 
Expected Years of Life Lost (SEYLL) averted 
per dose of vaccine as the metric for prema- 
ture death (74). SEYLL calculates life years 
lost compared to a standardized reference 
life table—that is, a person’s life expectancy 
at each age as estimated on the basis of the 
lowest observed age-specific mortality rates 
anywhere in the world. 

SEYLL has three major advantages. First, 
it regards all deaths as important but ear- 
lier deaths as particularly important. Thus, 
it integrates the aims of limiting harm and 
of prioritizing the least advantaged, par- 
ticularly because early deaths are more 
frequent in low-income countries and are a 
proxy for being disadvantaged overall (10). 
Second, SEYLL incorporates equal moral 
concern by valuing a life saved at a given 
age identically across countries, regardless 
of preexisting conditions or differences in 
national life expectancy. Finally, SEYLL is a 
standard metric used in global burden-of- 
disease calculations (14). 
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Three phases of Fair Vaccine Distribution 


DISTRIBUTION PHASE PRIMARY AIM 


METRIC TO DISTRIBUTE VACCINE DOSES 


HOW THE METRIC FULFILLS VALUES 


PRIORITIZATION 


Reducing premature 
deaths 


Reducing foreseeable 
premature deaths 
directly or indirectly 
caused by COVID-19. 


Standard expected years of 
life lost (SEYLL) averted by 
administering vaccine. 


Prevents substantial harms and gives priority to 
the worst-off by giving weight to premature deaths. 


Recognizes equal moral concern by valuing a life 
saved at a given age identically across countries. 


Priority to countries that 
would reduce more SEYLL 
per dose of vaccine. 


Reducing serious 
economic and social 
deprivations 


Reducing serious 
economic, social, and 
fatal and nonfatal 
health harms caused 
by COVID-19. 


SEYLL averted. 


Reduction in absolute poverty 
measured by poverty gap. 


Declines in gross national income 


people in poverty. 


(GNI) averted by administering vaccine. 


Prevents harm by recognizing a wide range of 
economic, social, and health deficits. 


Gives priority to the worst-off by prioritizing 


Priority to countries that 
would reduce more poverty, 
avert more loss of GNI, and 
avert more SEYLL per dose 
of vaccine. 


Returning to full 
functioning 


Ending community 
spread of COVID-19. 


Phase 2 retains SEYLL as the health met- 
ric, treating it as a mortality measure and a 
proxy for morbidity. The novelty and uncer- 
tain long-term effects of COVID-19 preclude 
using more typical measures of morbidity, 
such as Years Lived with Disability. 

No single socioeconomic metric inte- 
grates benefiting people and _ prioritizing 
the disadvantaged. Accordingly, we propose 
two metrics for phase 2 that capture overall 
economic improvement and the extent to 
which people would be spared from poverty. 
Because poverty is an extreme form of depri- 
vation, people’s moral claim to avoid poverty 
is especially urgent. The Fair Priority Model 
measures poverty by the projected reduction 
in the absolute size of the poverty gap per 
dose of vaccine, with the poverty line set at 
a uniform absolute level to be selected by the 
implementers. The poverty gap is the ratio by 
which the mean income of the poor falls be- 
low the poverty line; it accounts for both the 
prevalence and depth of poverty. Overall eco- 
nomic impact is measured by the projected 
absolute improvement in gross national in- 
come (GNI) per vaccine dose. Considering 
absolute improvement in GNI per dose is 
preferable to considering improvement in 
per capita GNI or percentage improvement 
in GNI, which would favor countries with 
smaller populations or economies and per- 
mit unnecessary harm without prioritizing 
the disadvantaged. Moreover, increased GNI 
in one country will also lead to cross-border 
gains through trade, employment, and trans- 
fers. These simple economic metrics combine 
to ensure that vaccines prevent substantial 
harms and prioritize the disadvantaged. 

In phase 3, countries with higher trans- 
mission rates are initially prioritized, but all 
countries should eventually receive sufficient 
vaccine to halt transmission, which is pro- 
jected to require that 60 to 70% of the popu- 
lation be immune. 


FLEXIBILITY OF THE MODEL 
Specifying how vaccines should be allocated 


will require integration of the model with 
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Ranking of different countries’ 
transmission rates. 


data and empirical forecasts. For instance, in 
phase 1, minimizing SEYLL might mean im- 
munizing those at high risk of death, those 
most likely to transmit infection, or those 
most at risk of initial infection. The vaccina- 
tion strategy that best averts SEYLL depends 
on each country’s demography, prevalent 
comorbidities, and health system capacity, 
as well as open scientific questions: Will vac- 
cines reduce severity but not transmission, be 
less effective in the elderly, or require peri- 
odic boosters? The WHO’s Strategic Advisory 
Group of Experts is currently evaluating how 
much harm each strategy prevents. Similarly, 
the World Bank is evaluating the impact of 
COVID-19 on countries’ economic activity 
and world poverty. These or similar organi- 
zations can provide the analytic forecasts 
to guide actual distribution of the vaccines 
over time by the COVAX facility or vaccine 
producers. By specifying metrics that should 
guide allocation and monitoring the vaccine’s 
effect on outcomes, the Fair Priority Model 
naturally accommodates changes in our 
knowledge of COVID-19. 

How much vaccine should be distributed 
in each phase? Empirical uncertainty makes 
it impractical to fully specify the transition 
between phases now. However, distributors 
might set the first transition at the point 
where a vaccine successfully reduces the 
burden of COVID-19 from an emergency to 
the level of established health challenges. 
For example, phase 2 might commence once 
a vaccine reduces worldwide SEYLL due to 
COVID-19 to a level analogous to the burden 
of influenza. Similarly, the transition to phase 
3 might begin once additional vaccines either 
successfully narrow the poverty gap to pre- 
pandemic levels or encounter substantially 
diminishing returns in that effort. Because 
the distribution of vaccine doses among coun- 
tries is linked to the impact of the vaccine on 
common worldwide metrics, all countries 
should progress to the next phase approxi- 
mately simultaneously. This is approximate; 
some countries may struggle to control their 
outbreaks even with vaccine, but that should 
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Prevents harm and gives priority to the worst-off by 
prioritizing countries with higher transmission rates. 


Priority to countries with 
higher transmission rates. 


not preclude the rest of the world progress- 
ing to the next phase. Although we have de- 
lineated the ethical framework and metrics, 
epidemiological and economic assessments 
using the best available data will be needed 
to help determine when a phase should be 
considered complete. 


COMPARISON WITH OTHER PROPOSALS 
Two schemes for the international dis- 
tribution of COVID-19 vaccine have been 
proposed. First, the WHO suggests that 
countries receive doses proportional to pop- 
ulation in phase I (15). Phase I begins with 
3% of each country’s population receiving 
vaccines, and population-proportional allo- 
cation continues until every country has vac- 
cinated 20% of its population. The COVAX 
facility currently accepts this proposal, 
which is undergoing revision (5). 

A population-based distribution appears 
to express equal moral concern and may 
appear to be politically tenable. However, it 
mistakenly assumes that equality requires 
treating differently situated countries iden- 
tically rather than equitably responding 
to their different needs. Equally populous 
countries can face markedly different levels 
of premature death and economic devasta- 
tion from COVID-19. Aid to countries typi- 
cally is provided in approximate response 
to the severity of problems. Providing aid 
merely in proportion to population is unjus- 
tified and almost never used. For instance, 
it would be unethical to allocate antiretro- 
virals for HIV on the basis of population, 
rather than on HIV burden. Likewise, a fair 
distribution of COVID-19 vaccines should 
respond to the pandemic’s differential se- 
verity in different countries. 

The second proposal distributes vaccine to 
countries according to the number of front- 
line health care workers, the proportion of 
population over 65, and the number of peo- 
ple with comorbidities in the country (J5). 
This proposal seems to prioritize protecting 
those judged most likely to die and prevent- 
ing health system collapse due to health care 
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workers’ illness. But it is an empirical ques- 
tion whether this prioritization optimally 
reduces death, let alone premature death 
or serious economic harms. Preferentially 
immunizing health care workers may not 
substantially reduce harm in higher-income 
countries where personal protective equip- 
ment effectively protects health workers. 
Instead, vaccinating those whose housing or 
occupation or age puts them at greatest risk 
of spreading infection, or people at highest 
risk of becoming infected, might best pre- 
vent harm. Only data can determine which 
approach best fulfills the ethical value of re- 
ducing premature deaths. 

Further, because the second proposal does 
not use SEYLL to correct for disadvantages 
due to differential national life expectancy, 
it compounds disadvantage compared to 
the Fair Priority Model. Since low- and 
middle-income countries have fewer older 
residents and health care workers per cap- 
ita than high-income countries, this scheme 
allocates less vaccine to countries already 
disadvantaged by weaker health systems 
and shorter average life spans. 


OBJECTIONS CONSIDERED 

We consider three potential objections to the 
Fair Priority Model. First, some might argue 
that countries should receive vaccine only if 
they can provide assurance that they will dis- 
tribute it to minimize premature deaths and 
mitigate economic harms, and have the infra- 
structure to effectively do so. 

Allocating vaccine doses to countries lack- 
ing the infrastructure to administer them 
would unjustifiably waste a lifesaving re- 
source. Consequently, fair allocation may be 
conditional on infrastructural capacity and 
might also require efforts to help poorer 
countries develop such infrastructure. 

Conditioning vaccine on fair distribution 
within countries is more problematic. A fair 
distribution of emergency supplies ultimately 
aims at helping individuals: They are the 
ones who live or die, prosper or are impov- 
erished. Some authoritarian countries may 
do an excellent job of distributing vaccine to 
minimize health, economic, and other harms. 
As long as individuals benefit, fair global dis- 
tribution among countries should neither 
require that intranational distribution of a 
vaccine be perfectly just nor seek to punish 
unrelated injustices. However, some coun- 
tries may grossly mismanage their domestic 
vaccine allocations, by, for instance, hoard- 
ing doses for a ruling elite. Addressing such 
hoarding may require making actual vac- 
cine distribution among countries in subse- 
quent phases or subsequent tranches within 
a phase conditional on a country’s having 
distributed the vaccine reasonably fairly to 
its members. But outside of extreme cases, 
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withholding vaccines to enforce conditional- 
ity inflicts disproportionate burdens, making 
conditionality rarely appropriate. 

Second, some might suggest that the Fair 
Priority Model unfairly disadvantages coun- 
tries that have effectively suppressed viral 
transmission without a vaccine and rewards 
those who have responded ineffectively. 

A fair distribution of vaccine among coun- 
tries must mitigate future health, economic, 
and other harms spawned by COVID-19. It 
should not be backward looking, punishing 
or rewarding countries for their COVID-19 
response or aiming to redress past injustices. 
The individuals whose lives and livelihoods 
are at risk often had little say in their govern- 
ments’ response to COVID-19. Further, medi- 
cine espouses treating people regardless of 
responsibility for their illness. Smokers who 
develop lung cancer and malaria patients 
who did not use bed nets are not denied care. 

Moreover, though the Fair Priority Model 
recommends allocating vaccine on the ba- 
sis of expected benefits, it does not exclude 
countries that have effectively suppressed 
COVID-19 transmission by making economic 
sacrifices. If these sacrifices translate into 
ongoing economic harms that vaccines can 
alleviate—an empirical question—they are 
addressed in phase 2. Waiting until phase 2 
to address these economic harms is appro- 
priate because premature deaths are more 
urgent and less compensable. Furthermore, 
development aid might address the effects 
of economic sacrifices more effectively than 
COVID-19 vaccines. 

Third, some might worry that the metrics 
are too uncertain and demanding to calcu- 
late, or could perversely incentivize coun- 
tries to exaggerate the spread and harm of 
COVID-19 to secure more vaccine earlier. 

In a novel, rapidly evolving pandemic, any 
approach sufficiently sophisticated to mean- 
ingfully operationalize ethical values will re- 
quire approximations as well as judgments 
about the relative weight to assign different 
metrics, such as SEYLL and the poverty gap. 
Simple metrics like population size avoid ap- 
proximations and trade-offs but fail to mea- 
sure what morally matters. Moreover, the 
proposed metrics are routinely used in global 
health, and basing vaccine distribution on 
these metrics will encourage collection and 
reporting of accurate data on changes in 
mortality and poverty related to COVID-19. 

Regarding perverse incentives, countries 
are unlikely to exaggerate the spread and 
harm of COVID-19 to secure more vaccine. 
Any temptation to exaggerate suffering from 
the pandemic will be tempered by a country’s 
need to reassure its public, visitors, inves- 
tors, and others about control of COVID-19 to 
stimulate economic activity and allow travel. 
Also, as Taiwan and New Zealand show, there 
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are notable soft power advantages associated 
with an effective pandemic response. 


CONCLUSION 

The Fair Priority Model is the best embodi- 
ment of the ethical values of limiting harms, 
benefiting the disadvantaged, and recogniz- 
ing equal concern. The responsibility for im- 
plementing the model rests with countries, 
international organizations, and vaccine 
producers. They need to use the cooperative 
mechanisms that have been created to deal 
with the pandemic, such as the COVAX fa- 
cility. Organizations also have indispensable 
roles in empirically assessing how vaccine 
distribution in fact affects countries with 
respect to metrics like SEYLL, poverty, and 
GNI. Ultimately, the model offers govern- 
ments, international organizations, and vac- 
cine producers a practical way to fulfill their 
pledges to distribute vaccine fairly and equi- 
tably, and make their words a reality. 
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NIH must confront the 
use of race in science 


Recent protests across the United States 
and the world have called attention to 
anti-Black racism in policing, employ- 
ment, housing, and education. Science 
and medicine also have long histories of 
racism (J, 2). This unfortunate yet per- 
sistent aspect of science and medicine 
includes the use of obsolete concepts of 
race to measure human biological differ- 
ence and the false belief, by some, that 
differences in disease outcomes stem 
primarily from pathophysiological differ- 
ences between racial groups (3, 4). 

We are particularly concerned that 
explanations for the disproportionate 
rates of coronavirus disease 2019 (COVID- 
19) in Black, Latino, Indigenous, and 
other communities of color will mistak- 
enly point to innate racial differences 
instead of long-standing institutional- 
ized racism and other underlying social, 
structural, and environmental determi- 
nants. Although genetic risk factors may 
contribute to severity of COVID-19 (5, 6), 
race is a poor proxy to understand the 
population distribution of such risk fac- 
tors (7). Compelling evidence shows that 
racism, not race, is the most relevant risk 
factor (8, 9). We are hopeful that scientists 
will not turn to racial science—a reflection 
of long-standing beliefs about superiority 
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and inferiority that have no place in sci- 
entific and clinical practice (7, 10)—to 
explain COVID-19 disparities and justify 
policy responses to it. However, racial cat- 
egories have been misused in the past. 

In 2016, we called for the elimination 
of the use of race as a means to classify 
biological diversity in both laboratory and 
clinical research. Since that time, little 
has changed (11). The National Institutes 
of Health (NIH) made progress by releas- 
ing a request for applications in support 
of research leading to the creation of best 
practices for the study of race and other 
population identifiers (72). However, RO1 
awards could take years to address these 
issues, and NIH still offers no guidance 
about the use of racial and ethnic identifi- 
ers in research beyond recruitment. There 
is an urgent need for NIH to provide 
scientists with information about what 
utility racial data have beyond fostering 
diversity in research, how such informa- 
tion should or should not be used in data 
analysis, and what identifiers of human 
populations might be better suited for use 
in biomedical research. 

To begin to address the misuse of racial 
measures in scientific and clinical prac- 
tice, we urge the director of NIH to lead 
education efforts directed at both scien- 
tists and the public about the nature of 
human genetic diversity and the ongoing 
need and obligation to confront racism in 
science. In these troubled times, a clear 
statement regarding use and misuse of 
population identifiers in the pursuit of 
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Amember of the Black Doctors COVID-19 Consortium, 
formed to help address health disparities in the 
African American community, tests a patient. Racial 
disparities in COVID-19 cases are better explained 

by structural racism than by genetic differences. 


characterizing human difference could 
help alleviate ongoing and widespread 
confusion on such matters. 

NIH should then support the National 
Academy of Sciences to bring together a 
diverse group of scientists and scholars 
to develop a consensus statement on best 
practices in genetic, clinical, and social 
scientific studies for characterizing human 
genetic diversity, including guidance for 
using racial categories to study racism’s 
impact on human health. Guidelines 
for federally funded science should also 
include best practices for the integration of 
biological, social, structural, and environ- 
mental health determinants into the study 
of human health and disease. 

NIH should continue and expand its 
work to hire more career scientists and 
clinicians from underrepresented minor- 
ity groups. It should also substantially 
increase the extramural funding that sup- 
ports scientists from underrepresented 
groups at every level of training and 
throughout career development. We have 
the tools to remedy this challenge. The 
time to act is now. 

Michael Yudell'*, Dorothy Roberts’, Rob DeSalle?, 
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‘Department of Community Health and Prevention, 
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Accumulation of plastic 
waste during COVID-19 


As lockdowns took effect to slow the 
spread of coronavirus disease 2019 
(COVID-19), the global demand for petro- 
leum collapsed. As a result, oil prices 
plummeted, making the manufacture of 
virgin plastics from fossil fuels less expen- 
sive than recycling (1). This cost incentive, 
along with lifestyle changes that increase 
plastic use, has complicated the challenge 
of overcoming plastic pollution. 

During the pandemic, personal protec- 
tive equipment (PPE) has driven increased 
plastic pollution. In response to high PPE 
demand among the general public, health 
care workers, and service workers, single- 
use face mask production in China soared 
to 116 million per day in February, about 
12 times the usual quantity (2). The World 
Health Organization has requested a 40% 
escalation of disposable PPE production (3). 
If the global population adheres to a stan- 
dard of one disposable face mask per day 
after lockdowns end, the pandemic could 
result in a monthly global consumption 
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Medical waste generated by COVID-19 protocols has overwhelmed waste treatment facilities in Wuhan, China. 


and waste of 129 billion face masks and 65 
billion gloves (4). Hospitals in Wuhan, the 
center of the COVID-19 outbreak, produced 
more than 240 tons of single-use plastic- 
based medical waste (such as disposable 
face masks, gloves, and gowns) per day at 
the peak of the pandemic, 6 times more 
than the daily average before the pandemic 
occurred (5). If the increases observed in 
Wuhan hold true elsewhere, the United 
States could generate an entire year’s worth 
of medical waste in 2 months (6). 

Individual choices during lockdowns 
are also increasing plastic demand. 
Packaged take-out meals and home- 
delivered groceries contributed an addi- 
tional 1400 tons of plastic waste during 
Singapore’s 8-week lockdown (7). The 
global plastic packaging market size is 
projected to grow from USD 909.2 billion 
in 2019 to 1012.6 billion by 2021, at a com- 
pound annual growth rate of 5.5%, mainly 
due to pandemic response (8). 

This global health crisis puts extra 
pressure on regular waste management 
practices, leading to inappropriate man- 
agement strategies, including mobile 
incineration, direct landfills, and local 
burnings (9). Improper disposal of just 1% 
of face masks translates to more than 10 
million items, weighing 30,000 to 40,000 
kg (10). Waterlogged COVID-19-related 
plastic has been observed on beaches and 
in water (11), potentially aggravating the 
challenge of curtailing microplastics. 

At the regional and national levels, 
prioritization of human health over 


Published by AAAS 


environmental health has led to the delay or 
reversal of policies aiming to reduce single- 
use plastic (9). As a result, demand for 
recycled plastic material has dropped, the 
profit margins of recycling have decreased, 
and the environmental footprint of plastics 
has increased (9). We need urgent and coor- 
dinated commitment to circular economy 
approaches, including recycling practices 
and strict policies against plastic pollution. 
Companies should continue efforts to cur- 
tail virgin plastic use and increase plastic 
recycling to live up to their corporate social 
and environmental responsibilities. Without 
a concerted effort to protect the environ- 
ment during and after the pandemic, we 
are unlikely to meet the United Nations’ 
Sustainable Development Goals (12). 

Tanveer M. Adyel 

Department of Civil Engineering, Monash 


University, Clayton, Melbourne, VIC 3800, 
Australia. Email: tanveer.adyel@monash.edu 


REFERENCES AND NOTES 


1. A.Kimini, “How the COVID-19 plastic boom could save 
the oil industry,” OilPrice.com (2020). 

2. F.Bermingham, S.-L. Tan, “Coronavirus: China's mask- 
making juggernaut cranks into gear, sparking fears 
of over-reliance on world's workshop,” South China 
Morning Post (2020); www.scmp.com/economy/ 
global-economy/article/3074821/coronavirus-chinas- 
mask-making-juggernaut-cranks-gear. 

3. “Shortage of personal protective equipment endanger- 
ing health workers worldwide” (WHO, 2020). 

4. J.C.Prata, A. L. Patricio Silva, T. R. Walker, A. C. Duarte, 
T.Rocha Santos, Environ. Sci. Tech. 54,7760 (2020). 

5. M.Zuo, “Coronavirus leaves China with mountains of 
medical waste,” South China Morning Post (2020). 

6. S.Cutler, “Mounting medical waste from COVID-19 
emphasizes the need for a sustainable waste manage- 
ment strategy” (Frost & Sullivan, 2020). 


sciencemag.org SCIENCE 


PHOTO: CHINE NOUVELLE/SIPA/NEWSCOM 


N 


S. Bengali, “The COVID-19 pandemic is unleashing 
a tidal wave of plastic waste,” The Los Angeles Times 
(2020). 

8. “COVID-19 impact on packaging market by material 
type, application and region—global forecast to 2021,” 
Business Insider (2020). 

9. A.L.P.Silvaetal., Sci. Total Environ. 742, 140565 (2020). 

10. “Inthe disposal of masks and gloves, responsibility 
is required” (World Wildlife Fund, 2020); www.ww.it/ 
scuole/?53500%2FNello-smaltimento-di-mascherine- 
e-guanti-serve-responsabilita [in Italian]. 

11. G. Stokes, “No shortage of surgical masks at the beach” 
(Oceans Asia, 2020). 

12. United Nations, “Sustainable development goals” 

(2015); www.un.org/sustainabledevelopment/ 

sustainable-development-goals/. 


10.1126/science.abd9925 


Microplastic’s role 
in antibiotic resistance 


Plastic pollution is universal and now 
viewed as an emerging environmental 
and human health crisis (7, 2). Successful 
management of plastic waste (3) is vital 
to meeting United Nations Sustainable 
Development Goal 14, which aims to pro- 
tect marine ecosystems from pollution 
and other threats (4). Plastic pollution is 
projected to escalate over the upcoming 
decades (5, 6), but critical knowledge gaps 
and uncertainties remain about its effects. 
Evidence that microplastic surfaces in 
aquatic environments host microorgan- 
isms that are resistant to antibiotics (7, 8) 
suggests that plastic pollution could have 
ramifications on disease transmission and 
treatment in addition to environmental 
consequences and human exposure to 
contaminated air, water, and food. 
Bacterial biofilms found on micro- 
plastics in aquatic ecosystems have been 
shown to include bacteria with antibiotic- 
resistant genes (7, 8). These resistant 
bacteria likely originate in human and 
animal populations treated with antibiot- 
ics and then travel downstream through 
wastewater into riverine and marine 
ecosystems (9). The increasing surface 
area provided by waste plastics, such as 
polyethylene, may enable higher rates of 
biofilm growth, including those contain- 
ing antibiotic-resistant genes (7). The pos- 
sibility that plastic pollution can facilitate 
resistance to antibiotics has critical impli- 
cations for the spread of disease and the 
management and regulation of antibiotic 
resistance in the environment (J0). 
Although scientists have made impor- 
tant strides in understanding the direct 
effects of microplastics on animal and 
plant life (77), the indirect effects of 
plastic pollution, including the sources 
and transport dynamics of antibiotic 
resistance, remain unclear. Scientists and 
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policy-makers should prioritize the evalu- 
ation of both direct and indirect effects of 
plastic pollution to fully assess the envi- 
ronmental and public health risks. 


Michael S. Bank!?*, Yong Sik Ok?4, 
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Comment on “No consistent ENSO response 
to volcanic forcing over the last millennium” 


Alan Robock 

Dee et al. (Reports, 27 March 2020, p. 1477) 
claimed that large volcanic eruptions do 

not produce a detectable El Nifio response. 
However, they come to the wrong conclusion 
because they have ignored the fundamental 
climate response to large volcanic erup- 
tions: Volcanic eruptions cool the surface, 
thus masking the relative El Nifio warming. 
Full text: dx.doi.org/10.1126/science.abcO502 


Response to Comment on “No consistent 
ENSO response to volcanic forcing over the last 
millennium” 


Sylvia G. Dee, Kim M. Cobb, Julien Emile-Geay, 
Toby R. Ault, R. Lawrence Edwards, Hai Cheng, 
Christopher D. Charles 

Robock claims that our analysis fails to 
acknowledge that pan-tropical surface 
cooling caused by large volcanic eruptions 
may mask El Nifio warming at our central 
Pacific site, potentially obscuring a volcano- 
El Nifio connection suggested in previous 
studies. Although observational support for 
a dynamical response linking volcanic cool- 
ing to El Niflo remains ambiguous, Robock 
raises some important questions about our 
study that we address here. 

Full text: dx.doi.org/10.1126/science.abc1733 


Published by AAAS 


Where 
Science 
Gets 
Social. 


AAAS.ORG/ 
COMMUNITY 


AAAS’ Member 
Community is a one-stop 
destination for scientists 

and STEM enthusiasts 
alike. It’s “Where 
Science Gets Social”: a 
community where facts 
matter, ideas are big and 
there’s always a reason 
to come hang out, share, 


discuss and explore. 


Science 


TECHNICAL COMMENTS 


Cite as: A. Robock, Science 
10.1126/science.abc0502 (2020). 


Response to Comment on “No consistent ENSO response to 
volcanic forcing over the last millennium” 


Alan Robock 


Department of Environmental Sciences, Rutgers University, New Brunswick, NJ 08901, USA. 


Email: robock@envsci.rutgers.edu 


Dee et a/. (Reports, 27 March 2020, p. 1477) claimed that large volcanic eruptions do not produce a 
detectable El Nifio response. However, they come to the wrong conclusion because they have ignored the 
fundamental climate response to large volcanic eruptions: Volcanic eruptions cool the surface, thus 


masking the relative El Nifo warming. 


The recent report by Dee et al. (J) claims that large volcanic 
eruptions do not produce a detectable El Nifio response. 
However, they come to the wrong conclusion because the 
coral temperature reconstructions they use measure actual 
sea surface temperature (SST) and not the temperature rela- 
tive to the rest of the tropics. Volcanic eruptions cool the 
surface, thus masking the relative El Nifo warming, if ex- 
pressed in absolute temperature changes. El Nifo is a dy- 
namical ocean response, which warms the eastern and 
central tropical Pacific with respect to the surrounding wa- 
ter. When the entire tropics cools in response to strato- 
spheric aerosols from volcanic eruptions, the absolute 
temperature in the El Nifio region will cool, too, and the 
impact will only be clear with respect to the surrounding 
region. This is called the relative SST (RSST), as shown by 
Khodri et al. (2). 

Dee et al. have produced a valuable climate record by 
using oxygen isotope records from Palmyra corals to give a 
record of SST near the center of the region in the central 
Pacific Ocean that warms during an El Nifo relative to the 
regions around it. But this SST is affected both by large- 
scale climate change and by local El Nifios. Whereas Dee et 
al. used RSST in their analysis of climate model simulations 
in their figure S8, the basic results in figure 4 consider only 
raw SST, without accounting for the cooling effects of the 
volcanic eruptions. This is because they do not have a relia- 
ble way to calculate the tropical average temperature from 
proxies, so it is important to interpret the actual SST record 
they have produced. In this comment I am only addressing 
the interpretations from the Palmyra 5'%O temperature re- 
constructions, and not the climate model results, because 
climate models still imperfectly simulate the El Nifio re- 
sponse to volcanic eruptions, as can be seen by the large 
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differences in the climate model simulations shown in fig- 
ure 4. 

The smaller eruptions, shown in figure 4, A and B, 
would not be expected to show a strong El Nifio signal be- 
cause of the small radiative forcing. Figure 4D shows the 
signal from the largest eruption in their study, the 1257 CE 
Samalas eruption. Rather than showing the expected large 
cooling in year 1 after the eruption, they found basically no 
signal, which I interpret as an El Nino, counteracted by the 
volcanic cooling. In fact, if there had not been an El Nifio, 
we would expect to see significant cooling. The dynamical 
response of the climate system that triggers El Nifios does 
not in general produce larger El Ninos for larger eruptions, 
so the El Nifio after this largest eruption would be expected 
to show a weaker absolute SST warming signal than that 
from smaller eruptions, because the volcanic cooling would 
be larger. 

Timmreck e¢ al. (3) have suggested that larger aerosol 
particles from larger SO. stratospheric injections from larg- 
er eruptions would make the radiative forcing less than lin- 
ear as a function of SO. input. Still, the radiative forcing as 
shown in figure 1 of Dee et al. for Samalas is approximately 
twice that of the average of the next three largest eruptions, 
and therefore we should expect twice as much cooling from 
that eruption. Guillet et al. (4) examined Northern Hemi- 
sphere responses to the Samalas eruption and found “that 
1258 and 1259 experienced some of the coldest Northern 
Hemisphere summers of the past millennium.” They also 
found that “in North America, volcanic radiative forcing was 
modulated by a positive phase of the El Nifo-Southern Os- 
cillation,” evidence indeed that the Samalas eruption pro- 
duced an El Nifio. 

Figure 4C of Dee et al. shows the SST signal averaged 
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for the four largest eruptions, including Samalas. Even with 
the 0.0%o signal from Samalas at lag 1 year, figure 4C shows 
a strong El Nifo signal, significant at close to the 95% sig- 
nificance level. If the 0.0%o value from Samalas had not 
been included in this average, the signal would have been 
higher by one-third of the signal and would have been 
0.13%0, not the current 0.10%o at lag 1 year, and would cer- 
tainly have been significant at a 95% level. 

If we take into account the expected cooling from vol- 
canic eruptions, the results from Dee e¢ al. show a clear El 
Nifo signal from the largest eruptions they considered. The 
El Nino SST signal for the largest eruption is obscured by 
the cooling effect of the eruption. The El Nifio SST signal 
from the next three largest eruptions is clear even when 
looking at the absolute SST signal. 
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Robock claims that our analysis fails to acknowledge that pan-tropical surface cooling caused by large 
volcanic eruptions may mask EI Nifio warming at our central Pacific site, potentially obscuring a volcano— 

EI Nifio connection suggested in previous studies. Although observational support for a dynamical response 
linking volcanic cooling to El Nifo remains ambiguous, Robock raises some important questions about our 


study that we address here. 


Modeling studies suggest that the El Nifo-Southern Oscilla- 
tion (ENSO) is sensitive to sulfate aerosol forcing associated 
with explosive volcanism, yet observational support for a 
dynamical chain of events linking large volcanic cooling to 
El Nifio occurrences remains inconclusive. In Dee et al. (J), 
we used absolutely dated fossil corals from the central trop- 
ical Pacific to test ENSO’s response to large volcanic erup- 
tions. Superposed epoch analysis reveals a weak tendency 
for an E] Nifo-like response in the year after an eruption, 
but this response is not statistically significant, nor does it 
appear after the outsized 1257 Samalas eruption. Dee e¢ al. 
suggested that models showing a strong ENSO response to 
volcanic forcing may overestimate the size of the forced re- 
sponse relative to natural ENSO variability. In a recent 
comment (2), Robock raises relevant questions about the 
conclusions of Dee e¢ al., addressed below. 

First, Robock advocates the use of relative sea surface 
temperature (RSST) to separate the ENSO signal from trop- 
ics-wide volcanic cooling. RSST focuses on spatial gradients 
and/or anomalies with respect to a large-scale (i.e., global or 
basin-scale) average, facilitating isolation of dynamical re- 
sponses in the midst of tropics-wide warming or cooling. 
Although we agree that RSST is a powerful diagnostic tool 
for the study of volcano-ENSO dynamics, individual paleo- 
climate records such as those presented in our study reflect 
local changes in absolute SST. Our constraints come from 
monthly-resolved coral 5%O anomalies at a single site (Pal- 
myra atoll, northern Line Islands), which offer the ad- 
vantage of a consistent interpretation with well- 
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characterized uncertainties, but, absent other monthly- 
resolved constraints on tropical Pacific SST during this time, 
preclude the computation of RSST. That said, we undertook 
an investigation of RSST in climate model output, as we 
show in figure S8 of (7). Yet another approach would use 
paleoclimate data assimilation products [e.g., (3)] spanning 
the last millennium. The latter dataset represents a more 
integrated approach, yet suffers from additional uncertain- 
ties with respect to the calculation of RSST (e.g., uneven 
proxy coverage, imperfect climate model “priors”). As such, 
even though our reliance on a single well-dated, well- 
characterized paleoclimate reconstruction is far from ideal, 
our study does add a new physically based constraint to on- 
going research on this key question. 

The second part of Robock’s argument concerns the lack 
of apparent cooling in the Palmyra coral 5'°O, given “the 
expected large cooling” that follows sufficiently large volcan- 
ic eruptions. This expectation is based on decades of studies 
on instrumental climate data and models [see (J, 4-11] but 
is not quantitatively supported by observations. Indeed, the 
lack of evidence for large cooling after the eruption of the 
Samalas complex in 1257 CE (4), the strongest eruption of 
the last millennium, has long puzzled researchers (5). As 
Robock points out, the radiative scaling to aerosol loading is 
sublinear, so one would expect a response that is somewhat 
less than twice as large. Recently, Guillet et al. (6) used both 
tree-ring mixed latewood density data (a sensitive proxy for 
summer surface air temperature) and highly calibrated doc- 
umentary data to constrain this response, finding a relative- 
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ly weak local temperature response in comparison to the 
inferred Samalas forcing. Although PMIP3 model simula- 
tions do simulate a large cooling in response to Samalas, 
recent studies suggest that the relatively simplistic repre- 
sentation of stratospheric aerosols in these models leads to 
exaggerated radiative forcing (7, 8), particularly for Samalas 
(9). The inferred amplitude of the forcing used to drive cli- 
mate models is derived from ice-core sulfate time series, yet 
the application of this forcing in models yields a large 
spread in simulated post-eruption climatic responses for 
CMIP5 experiments (J0). AS we point out in the text, the 
variable responses could be a result of uncertain forcing 
and/or structural uncertainties in model physics. Indeed, 
several volcanic forcing intermodel comparison projects are 
ongoing to identify the source of this spread [e.g., (JO, 11)]. 
Potentially important details include the eruption month 
[often unknown, yet critical for ENSO modulation (72)], the 
stratospheric injection height, and the relative forcing by 
extratropical (versus tropical) eruptions (73). None of these 
details were resolved in the volcanic forcing applied to 
PMIP3- and CMIP5-era model experiments, casting doubt 
on the expectations derived from them. 

Third, Robock, citing (6), argues “that 1258 and 1259 ex- 
perienced some of the coldest Northern Hemisphere sum- 
mers of the past millennium.” Recent reconstructions [see 
(3) and (14), figure S2b] support the notion that these were 
indeed cold years, but not exceptionally so in the context of 
the past millennium. Some compensation may occur, but 
the fact that Palmyra 5°O anomalies in that year are neutral 
suggests that if there was an El Nifio, it compensated for a 
minor global cooling, which does not advocate for a strong 
volcanic effect on ENSO. Guillet et al. (6)’s suggestion of a 
possible El Nifio phase in 1259 is actually based on work by 
one of the authors (/5), and its conclusions stem from a 
small number of proxy records located far from the core 
ENSO region. Our new record from the heart of the ENSO 
region (J) provides a more direct constraint on the tropical 
Pacific’s response to this eruption. 

Finally, Robock questions the arbitrariness of the 95% 
confidence level in ruling out a “significant” influence of 
volcanic forcing on ENSO. Although the choice of a 5% test 
level (false positive rate) is indeed arbitrary, it follows best 
statistical practice because it was chosen before the calcula- 
tion was made, and we applied it consistently throughout 
the analysis. Adjusting this threshold a posteriori to support 
a particular hypothesis would constitute a form of confirma- 
tion bias. 

In summary, as the text makes clear, our study does not 
rule out a possible volcanic influence on ENSO state; rather, 
it asserts that the currently available data do not uniformly 
support such an effect. Absence of evidence is not evidence 
of absence, and the effect expected by Robock may be re- 
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vealed at some point in more comprehensive SST recon- 
structions based on more abundant monthly-resolved coral 
observations spanning the global tropics. Until then, the 
lack of evidence for such a response in our analysis is con- 
sistent with the null hypothesis—that the internal variability 
in ENSO is as large as, if not larger than, any volcanically 
forced signal in ENSO characteristics. 
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By Laura M. Zahn 


he panorama of human phenotypes 
arises from a mix of common and rare 
genetic variants, some of which affect 
how genes are expressed and spliced 
throughout the body. More than a de- 
cade ago, scientists aiming to better 
understand the effects of genetic diver- 
sity in healthy individuals launched the 
Genotype-Tissue Expression (GTEx) 
Consortium. Here, Science unveils the third 
and final phase of the project, presenting the 
results from the analysis of the version 8 (v8) 
GTEx release. 

The v8 data release includes an increased num- 
ber of tissues and individuals, which allows for 
more accurate mapping of puta- 
tively causal variants and identi- 
fies cell type-specific differences 
in gene expression. The increased 
size of the study also provides the 
power to link genetic variation to 
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Heatmaps of differential gene expression 
among individuals from the 
Genotype-Tissue Expression Project. 
From left to right, illustrations 
represent data obtained for 
the lungs, heart, intestines, and thyroid. 


gene expression, both proximally and distally in 
the genome, so that cis and trans effects as well 
as population- and sex-specific differences in 
gene expression can be detected. 

The efforts of the GTEx Consortium have led 
to the development of numerous tools, includ- 
ing Watershed, and have provided a comprehen- 
sive resource for the scientific community. The 
GTEx project has established a foundation to 
elucidate how genetic variants affect gene regu- 
lation and quantitative traits in humans. Such 
studies of genetic variation and tissue specificity 
inform on properties of the genome—including 
noncoding elements and the telomeres found 
at chromosome ends—and help us understand 
how gene variants influence ag- 
ing and disease. This work sets 
the stage for future exploration 
into the effects of the common 
and rare variants that underlie 
the gamut of humanity. 
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RESEARCH ARTICLE 


HUMAN GENOMICS 


The GTEx Consortium* 


he characterization and interpretation of 

the function of the millions of genetic 

variants across the human genome re- 

main a pressing need in human genetics. 

Understanding the effects of genetic var- 
iation is essential for identifying the molecular 
mechanisms of genetic risk for complex traits 
and diseases, which are mainly driven by non- 
coding loci with largely uncharacterized regu- 
latory functions. To address this challenge, 
several projects have built comprehensive 
annotations of genome function across tis- 
sues and cell types (7, 2) and mapped the effects 
of regulatory variation across large numbers of 
individuals, primarily from whole blood and 
blood cell types (3-5). The Genotype-Tissue 
Expression (GTEx) project provides an essen- 
tial intersection where variant function can be 
studied across a wide range of both tissues 
and individuals. 

The GTEx project was launched in 2010 with 
the aim of building a catalog of genetic effects 
on gene expression across a large number of 
human tissues to elucidate the molecular mech- 
anisms of genetic associations with complex 
diseases and traits and to improve our under- 
standing of regulatory genetic variation (6). 
The project set out to collect biospecimens 
from ~50 tissues from up to ~1000 post- 
mortem donors and to create standards 
and protocols for optimizing postmortem 
tissue collection and donor recruitment (7, 8), 
biospecimen processing (7), and data sharing 
(www.gtexportal.org). 

Following the GTEx pilot (9) and midstage 
results (10), we present a final analysis of the 


*A full list of the GTEx authors and their affiliations is available at 
the end of this article. 
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The GTEx Consortium atlas of genetic regulatory 
effects across human tissues 


The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on 

the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease 
associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing 
samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic 
associations for gene expression and splicing in cis and trans, showing that regulatory associations are 
found for almost all genes, and describe the underlying molecular mechanisms and their contribution 
to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, 

we provide insights into the tissue specificity of genetic effects and show that cell type composition is a 
key factor in understanding gene regulatory mechanisms in human tissues. 


version 8 (v8) data release from the GTEx Con- 
sortium. We provide a catalog of genetic regu- 
latory variants affecting gene expression and 
splicing in cis and trans across 49 tissues 
and describe patterns and mechanisms of 
tissue and cell type specificity of genetic reg- 
ulatory effects. Through integration of GTEx 
data with genome-wide association studies 
(GWASs), we characterize mechanisms of how 
genetic effects on the transcriptome mediate 
complex trait associations. 


Quantitative trait locus (QTL) discovery 


The GTEx v8 dataset, after quality control (77), 
consists of 838 donors and 17,382 samples 
from 52 tissues and two cell lines. In the 
analysis of this study, we used 49 tissues or 
cell lines that had at least 70 individuals for 
which both RNA sequencing (RNA-seq) and 
genotype data from whole-genome sequenc- 
ing (WGS) were available, for a total of 15,201 
samples from 838 donors (Fig. 1A and figs. S1 
and S82). Of the 838 donors, 715 (85.3%) were 
European American, 103 (12.3%) African Ameri- 
can, and 12 (1.4%) Asian American, with 16 (1.9%) 
reporting Hispanic or Latino ethnicity; 557 
(66.4%) donors were male and 281 (33.5%) 
female (fig. S1). WGS was performed for each 
donor to a median depth of 32x, resulting in 
the detection of 43,066,422 single-nucleotide 
variants after quality control and phasing 
[10,008,325 with minor allele frequency (MAF) = 
0.01] and 3,459,870 small indels (762,535 with 
MAF = 0.01) (fig. S3 and table S1) (11). The 
mRNA of each of the tissue samples was se- 
quenced to a median depth of 82.6 million 
reads, and alignment, quantification, and qual- 
ity control were performed as described in (17) 
(figs. S4 to S6). 

The resulting data provide a broad survey 
of individual- and tissue-specific gene expres- 
sion, enabling a comprehensive view of the 


1 of 13 


V. ALTOUNIAN/SCIENCE, IN COLLABORATION WITH CHRISTIAN STOLTE (DATA) GTEX CONSORTIUM 


RESEARCH | GENETIC VARIATION 


Lung (515) @ 

Breast mammary tissue (396) @ 
Pancreas (305)@ 

Liver (208)@ —— 

Adrenal gland (233) @ 

Kidney cortex (73) @ 

Kidney medulla (4) @ 

Visceral omentum (469) @ 


Small intestine terminal ileum (174) @ 


Fallopian tube (8) 0 

Ovary (167) @ 

Uterus (129) @ 

Not sun-exposed skin (suprapubic) (517) @ VY 
/, 


/ 
Ectocervix (9) 0 ——* 


Vagina (141) @ 
—#; 


Endocervix (10) @ 


Sun-exposed skin (lower leg) (605) @ 
Cultured fibroblasts (483) o 
Subcutaneous adipose (581) @ 
Skeletal muscle (706) @ 


Fig. 1. Sample and data types in the GTEx v8 study. (A) Illustration of the 

54 tissue types examined (including 11 distinct brain regions and two cell lines), 
with sample numbers from genotyped donors in parentheses and color coding 
indicated in the adjacent circles. Tissues with 70 or more samples were included 


impact of genetic variation on gene regulation 
(Fig. IB). We mapped genetic loci that affect 
the expression (eQTL) or splicing (sQTL) of 
protein-coding and long intergenic noncoding 
RNA (lincRNA) genes, both in cis and trans. 
Genes with an eQTL or sQTL are called eGenes 
and sGenes, respectively, and the correspond- 
ing significant variants are called eVariants 
and sVariants, respectively. 

Across all tissues, we discovered cis-eQTLs 
[5% false discovery rate (FDR) per tissue (17), 
with 1% FDR results shown in fig. $7] for 
18,262 protein-coding and 5006 lincRNA genes 
[23,268 genes with a cis-eQTL (i.e., cis-eGenes) 
corresponding to 94.7% of all protein-coding 
and 67.3% of all lincRNA genes detected in 
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at least one tissue], with a total of 4,278,636 
genetic variants (43% of all variants with 
MAF 2= 0.01) that were significant in at least 
one tissue (cis-eVariants) (Fig. 2A, figs. S7 and 
S8, and table S2). The discovered eQTLs had a 
high replication rate in external datasets (figs. 
$12 and S13). Cis-eQTLs for all long noncoding 
RNAs (IncRNAs), which include lincRNAs and 
other types, are characterized in (12). The genes 
lacking a cis-eQTL were enriched for those 
lacking expression in the tissues analyzed 
by GTEx, including genes involved in early 
development (fig. S9). While most of the 
discovered cis-eQTLs had small effect sizes 
measured as allelic fold change (aFC), across 
tissues an average of 22% of cis-eQTLs had a 
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in QTL analyses. (B) Illustration of the core data types used throughout the 
study. Gene expression and splicing were quantified from bulk RNA-seq of 
heterogeneous tissue samples, and local and distal genetic effects (cis-QTLs and 
trans-QTLs, respectively) were quantified across individuals for each tissue. 


greater than twofold effect on gene expres- 
sion (fig. S14). We mapped sQTLs in cis with 
intron excision ratios from LeafCutter (JI, 13) 
and discovered 12,828 (66.5%) protein-coding 
and 1600 (21.5%) lincRNA genes (14,424 total) 
with a cis-sQTL (5% FDR per tissue) in at least 
one tissue (cis-sVariants) (Fig. 2A and table 
$2; with 1% FDR results shown in fig. S7). As 
expected (10), cis-QTL discovery was highly 
correlated with the sample size for each tissue 
[Spearman’s rank correlation coefficient (p) = 
0.95 for cis-eQTLs and 0.92 for cis-sQTLs]. The 
increased cis-eQTL discovery in larger tissues 
is primarily driven by additional power to dis- 
cover small effects, with discovery of cis-eGenes 
with a greater than twofold effect saturating 
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Fig. 2. QTL discovery. (A) The number of genes with a cis-eQTL (eGenes) or 
cis-sSQTL (sGenes) per tissue, as a function of sample size. See Fig. 1A for the 
legend of tissue colors. (B) Allelic heterogeneity of cis-eQTLs depicted as 
proportion of eGenes with one or more independent cis-eQTLs (blue stacked 
bars; left y axis) and as a mean number of cis-eQTLs per gene (red dots; right 
y axis). The tissues are ordered by sample size. (©) The number of genes 


at ~1500 genes in tissues with >200 samples 
(fig. S14). 

Previous studies have shown widespread 
allelic heterogeneity of gene expression in cis, 
that is, multiple independent causal eQTLs 
per gene (4, 14, 15). We mapped independent 
cis-eQTLs and cis-sQTLs using stepwise regres- 
sion, where the 5% FDR threshold for sig- 
nificance was defined by the single cis-QTL 
mapping (J0). We observed widespread allelic 
heterogeneity, with up to 50% of eGenes having 
more than one independent cis-eQTL in the 
tissues with the largest sample sizes (Fig. 2B 
and fig. S10). Our analysis captured a lower 
rate of allelic heterogeneity for cis-sQTLs, which 
could be a result of both underlying biology 
and lower power in cis-sQTL mapping (fig. 
S10). These results highlight gains in cis-eQTL 
mapping with increasing sample sizes, even 
when the discovery of new eGenes in specific 
tissues starts to saturate. 

Interchromosomal trans-eQTL mapping 
yielded 143 trans-eGenes (121 protein-coding 
and 22 lincRNA at 5% FDR assessed at the 
gene level, separately for each gene type), after 
controlling for false positives due to read 
misalignment (J7, 16) (table S13). The number 
of trans-eGenes discovered per tissue is cor- 
related with sample size (Spearman’s p = 0.68) 
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and to the number of cis-eQTLs (Spearman’s 
p = 0.77), with outlier tissues such as testis 
contributing disproportionately to both cis and 
trans (Fig. 2C). We identified a total of 49 trans- 
eGenes in testis, 47 of which were found in no 
other tissue even at FDR 50%. Greater than 
twofold effect sizes on trans-eGene expression 
were observed for 19% of trans-eQTLs (fig. S14). 
Trans-sQTL mapping yielded 29 trans-sGenes 
(5% FDR per tissue), including a replication of 
a previously described trans-sQTL (3) and visual 
support of the association pattern in several loci 
(1D) (fig. S11 and table S14). These results suggest 
that while trans-sQTL mapping is challenging, 
we can discover robust genetic effects on splicing 
in trans. 

We produced allelic expression (AE) data 
using two complementary approaches (77). In 
addition to the conventional AE data for each 
heterozygous genotype, we produced AE data 
by haplotype, integrating data from multiple 
heterozygous sites in the same gene, yielding 
153 million gene-level measurements (28 reads) 
across all samples (17). Allelic expression re- 
flects differential regulation of the two haplo- 
types in individuals that are heterozygous for a 
regulatory variant in cis; indeed, cis-eQTL effect 
size is strongly correlated with allelic expres- 
sion (median Spearman’s p = 0.82) (10). We 
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with a trans-eQTL as a function of the number of cis-eGenes. (D) Sex-biased 
cis-eQTL for AURKA in skeletal muscle, where rs2273535-T is associated with 
increased AURKA expression in males (P = 9.02 x 10-*”) but not in females 

(P = 0.75). (E) Population-biased cis-eQTL for SLC44A5 in esophagus mucosa 
[aFC = -2.85 and -4.82 and in African Americans (AA) and European Americans 
(EA), respectively; permutation P value = 1.2 x 10~°]. TPM, transcripts per million. 


hypothesized that cis-sQTLs could also par- 
tially contribute to allelic imbalance, even if 
only for parts of transcripts. However, there 
is drastically less signal of increased allelic 
imbalance among individuals heterozygous 
for cis-sQTLs (median Spearman’s p = —0.05) 
(fig. S15), which indicates that AE data primarily 
capture cis-eQTL effects and that genetic splic- 
ing variation in cis is not strongly reflected in 
gene-level AE data. 


Genetic regulatory effects across populations 
and sexes 


Variability in human traits and diseases be- 
tween sexes and population groups likely partially 
results from differences in genetic effects 
(18-20). To study whether genetic regulatory 
variants manifest such variability, we analyzed 
variable cis-eQTL effects between males and 
females, as well as between individuals of 
European ancestry and those of African an- 
cestry. Because external replication datasets 
are sparse, we developed an AE approach for 
validation with an orthogonal data type from 
the same samples (17): Allelic imbalance in in- 
dividuals heterozygous for the cis-eQTL allows 
individual-level quantification of the cis-eQTL 
effect size (27) and can be correlated with the 
interaction terms used in cis-eQTL analysis 
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Fig. 3. Fine-mapping of cis-eQTLs. (A) Number 
of eGenes per tissue with variants fine-mapped 
with >0.5 posterior probability of causality, 
using three methods. The overall number of 
eGenes with at least one fine-mapped eVariant 
increases with sample size for all methods. 
However, this increase is in part driven by 
better statistical power to detect small effect 
size cis-eQTLs (aFC < 1 in logs scale; see also 
fig. S14) with larger sample sizes, and the 
proportion of well fine-mapped eGenes with 
small effect sizes increases more modestly 
with sample size (bottom versus top panels), 
indicating that such cis-eQTLs are generally 
more difficult to fine-map. (B) Enrichment of 
variants among experimentally validated 
regulatory variants, shown for the cis-eVariant 
with the best P value (top eVariant), and 
those with posterior probability of causality 
>0.8 according to each of the three methods 
individually or all of them (consensus). Error 
bars: 95% confidence interval (Cl). (C) The 
cis-eQTL signal for CBX8 is fine-mapped 

to a credible set of three variants (red and 
purple diamonds), of which rs9896202 
(purple diamond) overlaps a large number of 
transcription factor binding sites in ENCODE 
chromatin immunoprecipitation sequencing 
(ChIP-seq) data and disrupts the binding motif 
of EGRI. (D) The potential role of EGR1 binding 
driving this cis-eQTL is further supported by 
correlation between EGRI expression and the 
CBX8 cis-eQTL effect size across tissues. 


to validate modifier effects of the cis-eQTL 
association (fig. S16). 

To characterize sex-differentiated genetic ef- 
fects on gene expression in GTEx tissues, we 
mapped sex-biased cis-eQTLs (sb-eQTLs). Ana- 
lyzing the set of all conditionally independent 
cis-eQTLs, we identified eQTLs with signifi- 
cantly different effects between sexes by fitting 
a linear regression model and testing for a 
significant genotype-by-sex (GxS) interaction 
(11). Across the 44 GTEx tissues shared be- 
tween sexes, we identified 369 sb-eQTLs (FDR 
< 25%), characterized further in (22). Sex-biased 
eQTL discovery had a modest correlation with 
tissue sample size (Spearman’s p = 0.39, P = 
0.03), with most sb-eQTLs discovered in breast 
but others also discovered in muscle, skin, and 
adipose tissues. 

In some cases, the cis-eQTL signal—identified 
with males and females combined—seems to 
be driven exclusively by one sex. For example, 
the cis-eQTL association of rs2273535 with the 
gene AURKA in skeletal muscle (cis-eQTL P = 
6.92 x 10°) is correlated with sex (Pexs = 
9.28 x 107!”, Storey gexs = 1.07 x 10°77, AE 
validation P = 1.15 x 10°") and present only 
in males (Fig. 2D and fig. S17). AURKA is a 
member of the serine and threonine kinase 
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family involved in mitotic chromosomal seg- 
regation that has been widely studied as a risk 
factor in several cancers (23-26) and has re- 
cently been shown to be involved in muscle 
differentiation (27). 

We also characterized population-biased cis- 
eQTLs (pb-eQTLs), where a variant’s molecu- 
lar effect on gene expression differs between 
individuals of European and African ancestry, 
controlling for differences in allele frequency, 
linkage disequilibrium (LD), and covariates 
(1). Analyzing 31 tissues with sample sizes 
>20 in both populations, we mapped genes 
with a different eQTL effect size measured 
by aFC. After applying stringent filters to 
remove differences potentially explained by 
LD or other artifacts (fig. SI8A), we identified 
178 pb-eQTLs for 141 eGenes (FDR < 25%) that 
show a moderate degree of validation in allele- 
specific expression data (fig. S18, C and D, and 
table S10). 

While some of the pb-eQTL effects are 
tissue specific, there are also effects that are 
shared across most tissues (fig. SISE). Figure 
2E shows an example of a pb-eQTL for the 
SLC44A5 gene involved in transport of sugars 
and amino acids, which is expressed at diffe- 
rent levels in the epidermis of lighter skin and 
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darker skin (reconstructed in vitro) (28, 29). In 
Europeans, the derived allele of rs4606268 
decreases expression of the gene in esophagus 
mucosa (aFC = —4.82), but this effect is sig- 
nificantly lower in African Americans (aFC = 
-2.85, permutation P value = 1.2 x 10-7, AE 
validation P = 0.002) (fig. S18C). 

Altogether, despite the relaxed FDR, we dis- 
covered only a few hundred sex- or population- 
biased cis-eQTLs out of tens of thousands of 
cis-eQTLs in GTEx, which indicates that there 
are few regulatory variants with major modi- 
fier effects and that these associations con- 
tinue to be challenging to identify without a 
much larger sample size. However, the dis- 
covered effects can provide insights into sex- 
or population-specific regulatory effects on 
gene expression. Importantly, factors corre- 
lated with sex or population—for example, cell 
type composition or environmental exposures— 
may contribute to sex- or population-biased 
cis-eQTLs. These effects are described in de- 
tail in (22). 


Fine-mapping 


A major challenge of all genetic association 
studies is to distinguish the causal variants from 
their LD proxies. We applied three different 
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Fig. 4. Functional mechanisms of genetic regulatory effects. QTL enrichment 
in functional annotations for (A) cis-eQTLs and cis-sQTLs and for (B) trans-eQTLs. 
cis-QTL enrichment is shown as mean + SD across tissues; trans-eQTL enrichment 
as 95% Cl. UTR, untranslated region. (C) Enrichment of lead trans-eVariants 

or trans-sVariants that have been tested for cis-QTL effects also being significant 
cis-eVariants or cis-sVariants in the same tissue, respectively. Asterisk denotes 


statistical fine-mapping methods—CaVEMaN 
(30), CAVIAR (3), and dap-g (32)—to infer likely 
causal variants of cis-eQTLs in each tissue (Fig. 
3A) (1D). For many cis-eQTLs, the causal variant 
can be mapped with a high probability to a 
handful of candidates. The 90% credible set 
for each cis-eQTL consists of variants that in- 
clude the causal variant with 90% probability; 
using dap-g, we identified a median of six var- 
jants in the 90% credible set for each cis-eQTL 
(fig. S19). Furthermore, 9.3% of the cis-eQTLs 
have a variant with a posterior probability 
>0.8 according to dap-g, indicating a single 
likely causal variant for those cis-eQTLs. We 
defined a consensus set of 24,740 cis-eQTLs 
across all tissues (7709 unique variants), for 
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colocalizing trans-eQTLs. 


which the posterior probability was >0.8 
across all three methods (fig. S20). Fine- 
mapped variants were significantly more 
enriched among experimentally validated 
causal variants from MPRA (33) and SuRE 
(34) compared with the lead eVariant across 
all eGenes (Fig. 3B). The highest enrichment 
was observed for the consensus set, although 
with overlapping confidence intervals (Fig. 3B). 
This demonstrates how careful fine-mapping 
facilitates the identification of likely causal 
regulatory variants. 

Knowing the likely causal variant enables 
greater insights into the molecular mechanisms 
of individual eQTLs, including the mechanisms 
of their tissue-specific effects. Figure 3C shows 
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significant enrichment, P < 10. (D) Proportion of trans-eQTLs that are significant 
cis-eQTLs or mediated by cis-eQTLs. (E) Trans associations of cis-mediating 
genes identified through colocalization (PP4 > 0.8 and nominal association with 
discovery trans-eVariant P < 10°). (Top) Associations for four thyroid cis-eQTLs 
(indicated by gene names); (bottom) cis-mediating genes with five or more 


an example of an eQTL for the gene CBX8 that 
colocalizes with breast cancer risk and birth 
weight (posterior probability = 0.68 for both 
in lung). One of the three variants in the 
confident set overlaps the binding site and 
disrupts the motif of the transcription factor 
EGRI (1) (fig. S21). The role of EGRI as an 
upstream driver of this eQTL is further sup- 
ported by a cross-tissue correlation of the ef- 
fect size of the eQTL and the expression level 
of EGRI (Spearman’s p = —0.69) (Fig. 3D). 


Functional mechanisms of QTL associations 


Quantitative trait data from multiple molecu- 
lar phenotypes, integrated with the regulatory 
annotation of the genome (table S3), offer a 
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Fig. 5. Regulatory mechanisms of GWAS loci. (A) GWAS enrichment of 
cis-eQTLs, cis-sQTLs, and trans-eQTLs measured with different approaches: 
enrichment calculated from GWAS summary statistics of the most 
significant cis-QTL per eGene or sGene with QTLEnrich and LD score 
regression with all significant cis-QTLs (S-LDSC all QTLs), simple QTL overlap 
enrichment with all GWAS catalog variants, and LD score regression with 


Trait pleiotropy in UKB (Pn) Trait pleiotropy in UKB (Pn) 


(ENLOC) and association (PrediXcan), aggregated across tissues. 

(C) Concordance of mediated effects among independent cis-eQTLs for 

the same gene, shown for different levels of regional colocalization probability 
(RCP) (32), which is used as a proxy for the gene's causality. As the null, 
we show the concordance for LD matched genes without colocalization. 

(D) Proportion of colocalized cis-eQTLs with a matching phenotype for genes 


fine-mapped cis-QTLs in the 95% credible set (S-LDSC credible set) 

and using posterior probability of causality as a continuous annotation 
(S-LDSC causal posterior). Enrichment is shown as mean and 95% Cl. 

(B) Number of GWAS loci linked to eGenes or sGenes through colocalization 


powerful way to understand the molecular 
mechanisms and phenotypic consequences of 
genetic regulatory effects. As expected, cis-eQTLs 
and cis-sQTLs are enriched in functional ele- 
ments of the genome (Fig. 4A). Although the 
strongest enrichments are driven by variant clas- 
ses that lead to splicing changes or nonsense- 
mediated decay, these account for relatively 
few variants. Cis-sQTLs are enriched almost 
entirely in transcribed regions, whereas cis- 
eQTLs are enriched in both transcribed re- 
gions and transcriptional regulatory elements. 
Previous studies (4, 35) have indicated that cis- 
eQTL and cis-sQTL effects on the same gene 
are typically driven by different genetic variants. 
This observation is corroborated by the GTEx 
v8 data, where the overlap of cis-eQTL cred- 
ible sets of likely causal variants, from CAVIAR 
analysis, have only a 12% overlap with cis-sQTL 
credible sets (fig. S22). Functional enrichment 
of overlapping and nonoverlapping cis-eQTLs 
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and cis-sQTLs, using stringent LD filtering, 
showed that the patterns characteristic for 
each type—such as enrichment of cis-eQTLs 
in enhancers and cis-SQTLs in splice sites—are 
even stronger for distinct loci (fig. $22). 

We hypothesized that eVariants and their 
target eGenes in cis are more likely to be in the 
same topologically associated domains (TADs) 
that allow chromatin interactions between more 
distant regulatory regions and target gene 
promoters (36). To test this supposition, we 
analyzed TAD data from ENCODE (J) and 
cis-eQTLs from matching GTEx tissues (table 
$3). Compared to matching random variant- 
gene pairs and controlling for distance from 
the transcription start site, cis-eVariant and 
cis-eGene pairs were significantly enriched for 
being in the same TAD [median odds ratio 
(OR) 4.55; all P < 10°") (fig. $23). 

Trans-eQTLs are enriched in regulatory an- 
notations that suggest both pre- and post- 
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with different levels of rare variant trait association in the UK Biobank 
(UKB). (E) Horizontal GWAS trait pleiotropy score distribution for cis-eQTLs 
that regulate multiple versus a single gene (left) and for cis-eQTLs that 

are tissue-shared versus specific. 


transcriptional mechanisms (Fig. 4B). Unlike 
cis-eQTLs, trans-eQTLs are enriched in CTCF 
binding sites, suggesting that disruption of 
CTCF binding may underlie distal genetic 
regulatory effects, potentially via its effect on 
interchromosomal chromatin interactions (36). 
Trans-eQTLs are also partially driven by cis- 
eQTLs (37, 38), with a significant enrichment of 
lead trans-eVariants among cis-eVariants in the 
same tissue (5.9x; two-sided Fisher’s exact test, 
P = 5.03 x 10 °”) (Fig. 4C). A lack of analogous 
enrichment suggests that cis-sQTLs are less 
important contributors to trans-eQTLs (P = 
0.064), and trans-sVariants had no significant 
enrichment of either cis-eQTLs (P = 0.051) or 
cis-sQTLs (P = 0.53). A further demonstra- 
tion of the important contribution of cis-eQTLs 
to trans-eQTLs is that, on the basis of media- 
tion analysis, 77% of lead trans-eVariants that 
are also cis-eVariants (corresponding to 31.6% 
of all lead trans-eVariants) appear to act through 
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the cis-eQTL (Fig. 4D and fig. S24). Colocaliza- 
tion of cis-eQTLs and trans-eQTLs was wide- 
spread and often tissue specific, with Fig. 4E 
showing cis-eQTLs with at least 10 nominally 
significant colocalized trans-eQTLs each [pos- 
terior probability of colocalization (PP4) > 0.8 
and trans-eQTL P < 10~°], pinpointing how 
local effects on gene expression can potentially 
lead to downstream regulatory effects across the 
genome (fig. S25 and table S16). The many re- 
maining trans-eQTLs that do not coincide with 
a cis-eQTL may arise owing to mechanisms in- 
cluding undetected cis effects in specific cell types 
or conditions, protein coding changes, effects on 
cell type heterogeneity, or more complex causal- 
ity such as a variant that influences a trait with 
downstream consequences on gene expression. 


Genetic regulatory effects mediate complex 
trait associations 


To analyze the role of regulatory variants in 
genetic associations for human traits, we first 
asked whether variants in the GWAS catalog 
were enriched for significant QTLs compared 
with all variants tested for QTLs (11). We ob- 
served a 1.46-fold enrichment for cis-eQTLs 
(63% versus 43%) and 1.86-fold enrichment 
for cis-sQTLs (37% versus 20%). The enrich- 
ment was even stronger for trans-eQTLs [6.97- 
fold (0.029% versus 0.0042%)], consistent with 
other analyses (39) (Fig. 5A, fig. S26, tables S5 
and S6). Cell type proportion may influence 
detection of trans-eQTLs in heterogeneous 
tissues and may also be reflected in GWAS 
associations for blood cell count phenotypes 
and other complex traits. To minimize the 
possible impact of cell type heterogeneity on 
these enrichment statistics, we excluded blood 
cellularity traits and repeated these analyses. 
The resulting enrichments were 5.21-fold for 
trans-eQTLs, 1.43-fold for cis-eQTLs, and 1.81-fold 
for cis-sQTLs, largely preserving the patterns 
observed using the full set of GWAS traits. 

This approach does not leverage the full 
power of GWAS and QTL association statistics, 
nor does it account for LD contamination, a 
situation wherein the causal variants for QTL 
and GWAS signals are distinct but LD between 
the two causal variants can suggest a false 
functional link (40). Therefore, for subsequent 
analyses (below) we selected 87 GWASs rep- 
resenting a broad array of binary and contin- 
uous complex traits that have summary results 
available in the public domain (11, 47). To match 
the ancestry of the GWASs, analyses were per- 
formed using cis-QTL statistics calculated from 
the European subset of GTEx donors (fig. S29). 
The analyses were performed for all pairwise 
combinations of 87 phenotypes and 49 tissues 
and are summarized using an approach that 
accounts for similarity between tissues and 
variable standard errors of the QTL effect es- 
timates, driven mainly by tissue sample size 
(fig. S27 and tables S4 and S11) (77). 
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To analyze the mediating role of cis-regulation 
of gene expression on complex traits (35, 42), we 
used two complementary approaches, QTLEnrich 
(43) and stratified LD score regression (S-LDSC) 
(1, 44). To rule out the possibility that en- 
richment is driven by specific features of cis- 
QTLs such as allele frequency, distance to the 
transcription start site, or local level of LD 
[number of LD proxy variants; coefficient of 
determination (R?) = 0.5], we used QTLEn- 
rich. We found a 1.46-fold (SE = 0.006) and 
1.56-fold (SE = 0.007) enrichment of trait as- 
sociations among best cis-eQTLs and cis-sQTLs, 
respectively, adjusting for enrichment among 
matched null variants (Fig. 5A and table $7). 
The fact that these enrichment estimates 
differ little from those derived from the GWAS 
catalog overlap (above), even after accounting 
for the potential confounders, indicates how 
relatively robust these estimates are. Next, we 
used S-LDSC, adjusting for functional annota- 
tions (44), to confirm the robustness of these 
results and to analyze how GWAS enrichment 
is affected by the causal eVariant or sVariant 
being typically unknown (JD). We computed the 
heritability enrichment of all cis-QTLs, fine- 
mapped cis-QTLs (in 95% credible set and 
posterior probability > 0.01 from dap-g), and 
fine-mapped cis-QTLs with maximum poste- 
rior inclusion probability as continuous anno- 
tation (45) (Fig. 5A). The largest increase in 
GWAS enrichment was for likely causal cis-QTL 
variants [11.1-fold (SE = 1.2) for cis-eQTLs and 
14.2-fold (SE = 2.4) for cis-sQTLs, for the con- 
tinuous annotation], which is strong evidence 
of shared causal effects of cis-QTLs and GWAS, 
and for the importance of fine-mapping. 

Joint enrichment analysis of cis-eQTLs and 
cis-SQTLs shows an independent contribution 
to complex trait variation from both (fig. S28) 
(1), consistent with their limited overlap (fig. 
$22). The relative GWAS enrichments of cis- 
sQTLs and cis-eQTLs were similar (Fig. 5A; not 
significant for the robust QTLEnrich and LDSC 
analyses), but the larger number of cis-eQTLs 
discovered (Fig. 2) suggests a greater aggregated 
contribution of cis-eQTLs. 

While these enrichment methods are pow- 
erful for genome-wide estimation of the QTL 
contribution to GWAS signals, they are not 
informative of regulatory mechanisms in indivi- 
dual loci. Thus, to provide functional interpreta- 
tion of the 5385 significant GWAS associations 
in 1167 loci from approximately independent LD 
blocks (46) across the 87 complex traits, we per- 
formed colocalization with ENLOC (32) to quan- 
tify the probability that the cis-QTL and GWAS 
signals share the same causal variant. We also 
assessed the association between the geneti- 
cally regulated component of expression or 
splicing and complex traits with PrediXcan 
(11, 41, 47). Both methods take multiple inde- 
pendent cis-QTLs into account, which is critical 
in large cis-eQTL studies with widespread allelic 
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heterogeneity, such as GTEx. Of the 5385 GWAS 
loci, 43 and 23% were colocalized with a cis- 
eQTL and cis-sQTL, respectively (Fig. 5B). A 
large proportion of colocalized genes coincide 
with significant PrediXcan trait associations 
with predicted expression or splicing (median 
of 86 and 88% across phenotypes, respec- 
tively) (figs. S30 to S33 and tables S8 and S15), 
with the full resource available in (47). While 
colocalization does not prove a causal role of a 
QTL in any given locus nor a genome-wide 
proportion of GWAS loci driven by eQTLs, 
these results do suggest target genes and their 
potential molecular changes for thousands of 
GWAS loci, sometimes including both cis and 
trans targets (fig. S34). 

Having multiple independent cis-eQTLs for 
a large number of genes allowed us to test 
whether mediated effects of primary and sec- 
ondary cis-eQTLs on phenotypes—the ratio of 
GWAS and cis-eQTL effect sizes—are concor- 
dant. To ensure that concordance is not driven 
by residual LD between primary and second- 
ary signals, we used LD-matched cis-eGenes 
with low colocalization probability as controls 
(11, 41) and observed a significant increase in 
primary and secondary cis-eQTL concordance 
for colocalized genes (correlated ¢ test P < 10-°°) 
(Fig. 5C). Additionally, colocalization of a cis- 
eQTL increased the colocalization of an inde- 
pendent cis-sQTL in the same locus (OR = 4.27, 
Fisher’s exact test P < 10") and, correspond- 
ingly, colocalization of a cis-sQTL increased 
cis-eQTL colocalization (OR = 4.54, Fisher’s 
exact test P < 10°) (figs. S35 and S36). These 
observations indicate that multiple regulatory 
effects for the same gene often mediate the 
same complex trait associations. Furthermore, 
genes with suggestive rare variant trait asso- 
ciations in the UK Biobank (48) have a sub- 
stantially increased proportion of colocalized 
eQTLs for the same trait (Fig. 5D and fig. S37), 
showing concordant trait effects from rare 
coding and common regulatory variants (49). 
These genes, as well as those with multiple 
colocalizing cis-QTLs, represent bona fide dis- 
ease genes with multiple independent lines 
of evidence. 

The growing number of genome and phe- 
nome studies has revealed extensive pleiotropy, 
where the same variant or locus associates 
with multiple organismal phenotypes (50). 
We sought to analyze how this phenomenon 
can be driven by gene regulatory effects. First, 
we calculated the number of cis-eGenes of 
each fine-mapped and LD-pruned cis-eVariant 
per tissue at local false sign rate (LFSR) < 5%, 
with cross-tissue smoothing of effect sizes with 
mashr (11, 51). We observed that a median of 
57% of variants were associated with more than 
one gene per tissue, typically co-occurring across 
tissues, indicating widespread regulatory plei- 
otropy. Using a binary classification of cis- 
eVariants with regulatory pleiotropy defined 
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Fig. 6. Tissue specificity of cis-QTLs. (A) Tissue clustering with pairwise 
Spearman correlation of cis-eQTL effect sizes. (B) Similarity of tissue clustering 
across core data types quantified using median pairwise Rand index calculated 
across tissues. (C) Tissue activity of cis expression and splicing QTLs, where 

an eQTL was considered active in a tissue if it had a mashr local false sign rate 
(LFSR, equivalent to FDR) of <5%. This is shown for all cis-QTLs and only those that 
could be tested in all 49 tissues (red and blue). (D) Spearman correlation 
(corr.) between cis-eQTL effect size and eGene expression level across tissues. 
cis-eQTL counts are shown for those not tested owing to low expression (low 
expr.) level; tested but without significant (FDR < 5%) correlation (uncorrelated); 
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a significant correlation, but effect sizes crossed zero, which made the correlation 
direction unclear (uninterpretable); positively correlated; and negatively correlated. 
(E and F) The effect of genomic function on cis-QTL tissue sharing modeled 

using logistic regression with functional annotations (E) and chromatin state (F). 
CTCF peak, motif, TF peak, and DHS (DNase | hypersensitive site) indicate whether 
the cis-QTL lies in a region annotated as having one of these features in any of 
the Ensembl! Regulatory Build tissues. For chromatin states, model coefficients are 
shown for the discovery and replication tissues that have the same or different 
chromatin states. INDEL, insertion or deletion; ZNF, zinc finger; TSS, transcription 
start site; Transcr., transcription; Enh, enhancers. 
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Fig. 7. Cell type interaction cis-eQTLs and cis-sQTLs. (A) Number of 

cell type interaction cis-eQTLs and cis-sQTLs (ieQTLs and isQTLs, respectively) 
with shading indicating whether 

the ieGene or isGene was discovered by cis-eQTL or cis-SQTL analysis in bulk 
tissue. Colored dots are proportional to sample size. (B) Functional enrichment 
of neutrophil ieQTLs and isQTLs compared with cis-eQTLs and cis-sQTLs 
ionally independent cis-eQTLs per 
ieQTLs in GTEx, and for eGenes that 
Ls) cis-eQTLs across five sorted blood 
cell types. (D) Whole blood cis-eQTL P value landscape for NCOA4, for the 
standard analysis (unconditional; top row) and for two independent cis-eQTLs 


discovered in seven tissue-cell type pairs, 


from whole blood. (C) Proportion of condi 
eGene, for eGenes that do or do not have 
have shared (=eQTLs) or nonshared (#eQT 


as those associated with more than one gene, 
we observed that they are more significantly 
associated with complex traits compared with 
matched cis-eVariants (fig. S38). This could 
be due to the fact that if a variant regulates 
multiple genes, there is a higher probability 
that at least one of them affects a GWAS 
phenotype. 

However, cis-eVariants with regulatory plei- 
otropy also have higher GWAS complex trait 
pleiotropy (50) than cis-eVariants with effects 
on a single gene (Fig. 5E). This observation 
suggests a mechanism for complex trait pleiot- 
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ropy of genetic effects where the expression 
of multiple genes in cis, rather than a single 
eGene effect, translates into diverse downstream 
physiological effects. Furthermore, GWAS plei- 
otropy is higher for tissue-shared (47) than 
tissue-specific cis-eQTLs, indicating that regu- 
latory effects affecting multiple tissues are more 
likely to translate to diverse physiological traits 
(Fig. 5E). 


Tissue specificity of genetic regulatory effects 


The GTEx data provide an opportunity to study 
patterns and mechanisms of tissue specificity of 
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(bottom rows). In a dataset of five sorted cell types (56), analyses of all cell 
types yielded a lead eVariant, rs2926494 (left), which is in high LD with 

the first independent cis-eQTL but not the second 
cis-eQTL analysis, rs10740051, is in high LD with 
cis-eQTL, indicating that this cis-eQTL is active specifically in monocytes. 
Thus, the full GTEx whole blood cis-eQTL pattern 
composed of cis-eQTLs that are active in differen 
posterior probability (PP4) of GWAS colocalization with whole blood ieQTLs and 
eQTLs of the same eGene. Three hundred forty-nine gene-trait combinations 
across 132 genes and 36 GWAS traits showed evidence of colocalization 

(PP4 > 0.5) with an ieQTL and/or eQTL. 


. The lead variant in monocyte 
the second conditional 


and allelic heterogeneity is 
t cell types. (E) COLOC 


the transcriptome and its genetic regulation. 
Pairwise similarity of GTEx tissues was quanti- 
fied from gene expression and splicing, as well 
as allelic expression, eQTLs in cis and trans, 
and cis-sQTLs (Fig. 6A and fig. S41) (77). These 
estimates show consistent patterns of tissue 
relatedness, indicating that the biological pro- 
cesses that drive transcriptome similarity also 
control tissue sharing of genetic effects (Fig. 
6B). As seen in earlier versions of the GTEx 
data (9, 10), the brain regions form a separate 
cluster, and testis, lymphoblastoid cell lines, 
whole blood, and sometimes liver tend to be 
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outliers, while most other organs have a notably 
high degree of similarity to each other. This 
indicates that blood is not an ideal proxy for 
most tissues and that some other relatively 
accessible tissues, such as skin, may better cap- 
ture molecular effects in other tissues. 

The overall tissue specificity of QTLs (11) 
follows a U-shaped curve, recapitulating pre- 
vious GTEx analyses (9, 10), where genetic 
regulatory effects tend to be either highly tissue 
specific or highly shared (Fig. 6C), with trans- 
eQTLs being more tissue specific than cis-eQTLs 
(fig. S40). Cis-sQTLs appear to be significantly 
more tissue specific than cis-eQTLs when con- 
sidering all mapped cis-QTLs, but this pattern 
is reversed when considering only those cis- 
QTLs where the gene or splicing event is 
quantified in all tissues (Fig. 6C and fig. S39). 
These observations indicate that splicing mea- 
sures are more tissue specific than gene ex- 
pression, but genetic effects on splicing tend to 
be more highly shared, which is consistent with 
pairwise tissue-sharing patterns (fig. S41). These 
opposite patterns are important for under- 
standing effects that disease-causing splicing 
variants may have across tissues and for vali- 
dation of splicing effects in cell lines that rarely 
are an exact match to cells in vivo. 

Next, we analyzed the sharing of AE across 
multiple tissues of an individual, which is a 
metric of sharing of any heterozygous regula- 
tory variant effects in that individual. Variation 
in AE has been useful for analysis of rare, 
potentially disease-causing variants (52). Using 
a clustering approach (J), we found that in 
97.4% of the cases, AE across all tissues 
forms a single cluster. This suggests that in 
AE analysis, different tissues are often rela- 
tively good proxies for one another, provided 
that the gene of interest is expressed in the 
probed tissue (fig. S42). 

We next computed the cross-tissue correla- 
tion of eQTL effect size and eGene expression 
level—often a proxy for gene functionality— 
and discovered that 1971 cis-eQTLs (7.4%; FDR 
5%) had a significant and robust correlation 
between eGene expression and cis-eQTL effect 
size across tissues (Fig. 6D and fig. S43). These 
correlated cis-eQTLs are split nearly evenly 
between negative (937) and positive (1034) 
correlations. Thus, the tissues with the highest 
cis-eQTL effect sizes are equally likely to be 
among tissues with higher or lower expression 
levels for the gene. Trans-eQTLs show a dif- 
ferent pattern, typically being observed in tissues 
with high expression of the trans-eGene relative 
to other tissues (fig. S43). 

These observations raise the question of how 
to prioritize the relevant tissues for eQTLs in 
a disease context. To address this, we chose a 
subset of GWAS traits with a strong prior indi- 
cation for the likely relevant tissue(s) (table S12). 
Analyzing colocalized cis-eQTLs for 1778 GWAS 
loci (11), we discovered that the relevant tissues 
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were significantly enriched in having high 
expression and effect sizes (paired Wilcoxon 
sign test P < 1.5 x 10“), but the relatively weak 
signal indicates that pinpointing the likely rele- 
vant tissue for GWAS loci is challenging (figs. 
S44 and S45 and table S9). These results in- 
dicate that both effect sizes and gene expres- 
sion levels are important for interpreting the 
tissue context where an eQTL may have down- 
stream phenotypic effects. 

The diverse patterns of QTL tissue specificity 
raise the question of what molecular mecha- 
nisms underlie the ubiquitous regulatory ef- 
fects of some genetic variants and the highly 
tissue-specific effects of others. To gain insight 
into this question, we modeled cis-eQTL and 
cis-sQTL tissue specificity using logistic regres- 
sion as a function of the lead eVariant’s genomic 
and epigenomic context (11). Cis-QTLs where 
the top eVariant was in a transcribed region 
had overall higher sharing than those in clas- 
sical transcriptional regulatory elements, in- 
dicating that genetic variants with post- or 
cotranscriptional expression or splicing effects 
have more ubiquitous effects (Fig. 6E). Canon- 
ical splice and stop-gained variant effects had 
the highest probability of being shared across 
tissues, which may benefit disease-focused 
studies relying on likely gene-disrupting 
variants. 

We also considered whether varying reg- 
ulatory activity between tissues contributed 
to tissue specificity of genetic effects, and we 
found that shared chromatin states between 
the discovery and query tissues were associated 
with increased probability of cis-eQTL sharing 
and vice versa (Fig. 6F). cis-eQTLs and cis-sQTLs 
followed similar patterns. Because cis-sQTLs 
are more enriched in transcribed regions and 
likely arise via posttranscriptional mechanisms 
(Fig. 4A), this is likely to contribute to their 
higher overall degree of tissue sharing (Fig. 
6C). In comparison to cis-eQTLs, cis-sQTLs are 
more often located in regions where regula- 
tory effects are shared. 

These data indicate a possible means by 
which we can predict whether a cis-eQTL ob- 
served in a GTEx tissue is active in another 
tissue of interest, using the variant’s anno- 
tation and properties in the discovery tissue 
(11). After incorporating additional features 
including cis-QTL effect size, distance to tran- 
scription start site, and eGene and sGene ex- 
pression levels, we obtain reasonably good 
predictions of whether a cis-QTL is active in 
a query tissue (median area under the curve = 
0.779 and 0.807, minimum = 0.703 and 0.721, 
maximum = 0.807 and 0.875 for cis-eQTLs and 
cis-sQTLs, respectively) (fig. S46). These re- 
sults suggest that it is possible to extrapolate 
the GTEx cis-eQTL catalog to additional tissues 
and potentially developmental stages, where 
population-scale data for QTL analysis are 
particularly difficult to collect. 
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From tissues to cell types 

The GTEx tissue samples consist of heteroge- 
neous mixtures of multiple cell types. Hence, 
the RNA extracted and QTLs mapped from 
these samples reflect a composite of genetic 
effects that may vary across cell types and may 
mask cell type-specific mechanisms. To char- 
acterize the effect of cell type heterogeneity on 
analyses from bulk tissue, we used the xCell 
method (53) to estimate the enrichment of 64 
reference cell types from the bulk expression 
profile of each sample (71). Although these 
results need to be interpreted with caution 
given the scarcity of validation data (54), the 
resulting enrichment scores were generally 
biologically meaningful with, for example, 
myocytes enriched in heart left ventricle and 
skeletal muscle; hepatocytes enriched in liver; 
and various blood cell types enriched in whole 
blood, spleen, and lung, which harbors a large 
leukocyte population (fig. $47). Interestingly, 
the pairwise relatedness of GTEx tissues de- 
rived from their cell type composition is highly 
correlated with tissue sharing of regulatory 
variants (cis-eQTL versus cell type composition 
Rand index = 0.92) (Fig. 6B and figs. S48 and 
S41), suggesting that similarity of regulatory 
variant activity between tissue pairs may 
often be due to the presence of similar cell 
types and not necessarily shared regulatory 
networks within cells. This observation high- 
lights the key role that characterizing cell 
type diversity will have for understanding 
not only tissue biology but genetic regulatory 
effects as well. 

Enrichment of many cell types shows inter- 
individual variation within a given tissue, 
partially owing to tissue sampling variation 
between individuals. This variation can be 
leveraged to identify cis-eQTLs and cis-sQTLs 
with cell type specificity by including an in- 
teraction between genotype and cell type en- 
richment in the QTL model (7, 55). We applied 
this approach to seven tissue-cell type pairs with 
robustly quantified cell types in the tissue where 
each cell type was most enriched (Fig. 7A) [an 
additional 36 pairs are described in (54)]. The 
largest numbers of cell type interaction cis- 
eQTLs and cis-sQTLs (ieQTLs and isQTLs, re- 
spectively) were 1120 neutrophil ieQTLs and 
169 isQTLs in whole blood and 1087 epithelial 
cell ieQTLs and 117 isQTLs in transverse colon 
(Fig. 7A). Of these ieQTLs, 76 and 229, respec- 
tively, corresponded to an eGene for which no 
QTL was detected in bulk tissue. 

We validated these effects using published 
eQTLs from purified blood cell types (56), 
where neutrophil eQTLs had higher neutro- 
phil ieQTL effect sizes than eQTLs from other 
blood cell types (fig. S49). For other cell types, 
external replication data was not available. 
Thus, we verified the robustness of the ieQTLs 
by the allelic expression validation approach 
that was used for sex- and population-biased 
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cis-eQTL analyses: For ieQTL heterozygotes, we 
calculated the Spearman correlation between 
cell type enrichment and ieQTL effect size 
from AE data and observed a high validation 
rate (54). Note that ieQTLs and isQTLs should 
not be considered cell type-specific QTLs, be- 
cause the enrichment of any cell type may be 
(anti)correlated with other cell types (fig. S50). 
While full deconvolution of cis-eQTL effects 
driven by specific cell types remains a chal- 
lenge for the future, ieQTLs and isQTLs can 
be interpreted as being enriched for cell type- 
specific effects. 

In most subsequent analyses to characterize 
the properties of ieQTLs and isQTLs, we focused 
on neutrophil ieQTLs, which are numerous 
and supported by external replication data. 
Functional enrichment analyses of these QTLs 
show that they largely follow the enrichment 
patterns observed for bulk tissue cis-QTLs 
(Fig. 7B). However, ieQTLs are more strongly 
enriched in promoter-flanking regions and 
enhancers, which are known to be major 
drivers of cell type-specific regulatory effects 
(2). Epithelial cell ieQTLs yielded similar pat- 
terns (fig. S51). 

We hypothesized that the widespread allelic 
heterogeneity observed in the bulk tissue cis- 
eQTL data could be partially driven by an 
aggregate signal from cis-eQTLs that are each 
active in a different cell type present in the 
tissue. Indeed, the number of cis-eQTLs per 
gene is higher for ieGenes than for standard 
eGenes, especially in skin and blood (Fig. 7C). 
While differences in power could contribute 
to this pattern, it is corroborated by eGenes 
that have independent cis-eQTLs (R? < 0.05) 
in five purified blood cell types (56) also 
showing an increased amount of allelic heter- 
ogeneity in GTEx whole blood (Fig. 7, C and 
D). Thus, quantifying cell type specificity 
can provide mechanistic insights into the 
genetic architecture of gene expression and 
may be leveraged to improve the resolution 
of complex patterns of allelic heterogeneity 
wherein we can distinguish effects manifest- 
ing in different cell types. 

Next, we analyzed how cell type interaction 
cis-QTLs contribute to the interpretation of 
regulatory variants underlying complex dis- 
ease risk. GWAS colocalization analysis of 
neutrophil ieQTLs (JJ) revealed multiple loci 
(111, ~32%) that colocalize only with ieQTLs 
and not with whole blood cis-eQTLs (Fig. 7E), 
although 75% (42 of 56) of the corresponding 
eGenes have both cis-eQTLs and ieQTLs. Im- 
proved resolution into allelic heterogeneity ap- 
pears to contribute to colocalization exclusively 
with eQTLs. For example, the absence of colo- 
calization between a platelet count GWAS signal 
and bulk tissue cis-eQTL for SPAG7 appears to 
be due to the whole blood signal being an ag- 
gregate of multiple independent signals (fig. 
$52). The neutrophil ieQTL analysis uncovers a 
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specific signal that mirrors the GWAS associa- 
tion, suggesting that platelet counts are affected 
by SPAG7 expression only in one or several 
specific cell types. Thus, in addition to previously 
undetected colocalizations pinpointing poten- 
tial causal genes, ieQTL analysis has the po- 
tential to provide insights into cell type-specific 
mechanisms of complex traits. 


Outlook 


The GTEx v8 data release represents a deep 
survey of both intra- and interindividual trans- 
criptome variation across a large number of 
tissues. With 838 donors and 15,201 samples— 
approximately twice the size of the v6 release 
used in the previous set of GTEx Consortium 
papers—we have created a comprehensive re- 
source of genetic variants that influence gene 
expression and splicing in cis. This substan- 
tially expands and updates the GTEx catalog of 
sQTLs, doubles the number of eGenes per 
tissue, and saturates the discovery of eQTLs 
with greater than twofold effect sizes in ~40 
tissues. The fine-mapping data of GTEx cis- 
eQTLs provide a set of thousands of likely 
causal functional variants. While trans-QTL 
discovery and the characterization of sex- and 
population-specific genetic effects are still limited 
by sample size, analyses of the v8 data provide 
important insights into each. 

Cell type interaction cis-eQTLs and cis- 
sQTLs, mapped with computational estimates 
of cell type enrichment, constitute an impor- 
tant extension of the GTEx resource to effects 
of cell types within tissues. The highly similar 
tissue-sharing patterns across these data types 
suggest shared biology from cell type composi- 
tion to transcriptome variation and genetic re- 
gulatory effects. Our results indicate that shared 
cell types between tissues may be a key factor 
behind tissue sharing of genetic regulatory 
effects, which will constitute a key challenge to 
tackle in the future. Finally, GWAS colocaliza- 
tion with cis-eQTLs and cis-sQTLs provides rich 
opportunities for further functional follow- 
up and characterization of regulatory mecha- 
nisms of GWAS associations. 

Given the very large number of cis-eQTLs, 
the extensive allelic heterogeneity—multiple 
independent regulatory variants affecting the 
same gene—is unsurprising. With well-powered 
cis-QTL mapping, it becomes possible and 
important to describe and disentangle these 
effects; the assumption of a single causal variant 
in a cis-eQTL locus no longer holds true for 
datasets of this scale. Similarly, we highlight 
cis-eQTL and cis-sQTL effects on the same 
gene, typically driven by distinct causal variants 
(4, 35). The joint complex trait contribution of 
independent cis-eQTLs and cis-sQTLs and that 
of cis-eQTLs and rare coding variants for the 
same gene highlights how different genetic 
variants and functional perturbations can 
converge at the gene level to similar physio- 
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logical effect. This orthogonal evidence pin- 
points highly likely causal disease genes, and 
these associations could be leveraged to build 
allelic series, a powerful tool for estimating 
dosage-risk relationship for the purposes of 
drug development (57). 

Finally, we provide mechanistic insights 
into the cellular causes of allelic heterogene- 
ity, showing the separate contributions from 
cis-eQTLs active in different cell types to 
the combined signal seen in a bulk tissue 
sample. With evidence that this increased 
cellular resolution improves colocalization in 
some loci, cell type-specific analyses appear 
particularly promising for finer dissection of 
genetic association data. 

Integration of GTEx QTL data and func- 
tional annotation of the genome provides 
powerful insights into the molecular mech- 
anisms of transcriptional and posttranscriptional 
regulation that affect gene expression levels 
and splicing. A large proportion of cis-eQTL 
effects are driven by genetic perturbations in 
classical regulatory elements of promoters 
and enhancers. However, the magnitude of 
these enrichments is perhaps unexpectedly 
modest, which likely reflects the fact that only 
a small fraction of variants in these large re- 
gions have true regulatory effects, leading to a 
lower resolution of annotating functional va- 
riants compared with the nucleotide-level an- 
notation of, for example, nonsense or canonical 
splice site variants. Context-specific genetic ef- 
fects of tissue-specific and cell type interaction 
cis-eQTLs are enriched in enhancers and re- 
lated elements and their variable activity across 
tissues and cell types. 

While cis-eQTLs are enriched for a wide 
range of functional regions, the vast majority of 
cis-SQTL are located in transcribed regions, 
with likely cotranscriptional and/or post- 
transcriptional regulatory effects. Interest- 
ingly, these appear to be less tissue specific, 
which likely contributes to the higher tissue 
sharing of cis-sQTLs than cis-eQTLs. The 
higher tissue sharing of all cotranscriptional 
or posttranscriptional regulatory effects may 
facilitate interpretation of potentially dis- 
ease-related functional effects of (rare) coding 
variants triggering nonsense-mediated decay 
or splicing changes, even when the disease- 
relevant tissues are not available. 

About a third of the observed trans-eQTLs 
are mediated by cis-eQTLs, demonstrating how 
local genetic regulatory effects can translate to 
effects at the level of cellular pathways. All types 
of QTLs that were studied are strong mediators 
of genetic associations to complex traits, with a 
higher relative enrichment for cis-sQTLs than 
cis-eQTLs and with trans-eQTLs having the 
highest enrichment of all (35). With large 
genome- and phenome-wide studies having 
uncovered extensive pleiotropy of complex trait 
associations, the GTEx data provide important 
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insights into the molecular underpinnings of 
this observed pleiotropy: Variants that affect 
the expression of multiple genes and multiple 
tissues have a higher degree of complex trait 
pleiotropy, indicating that some of the plei- 
otropy arises at the proximal regulatory level. 
Dissecting this complexity and pinpointing truly 
causal molecular effects that mediate specific 
phenotype associations will be a considerable 
challenge for the future. 

This study of the GTEx v8 data has provided 
insights into genetic regulatory architecture 
and functional mechanisms. The catalog of 
QTLs and associated datasets of annotations, 
cell type enrichments, and GWAS summary 
statistics requires careful interpretation but 
provides insights into the biology of gene reg- 
ulation and functional mechanisms of com- 
plex traits. We demonstrate how QTL data can 
be used to inform on multiple aspects of GWAS 
interpretation: potential causal variants from 
fine-mapping, proximal regulatory mechanisms, 
target genes in cis, and pathway effects in 
trans, in the context of multiple tissues and 
cell types. However, our understanding of ge- 
netic effects on cellular phenotypes is far from 
complete. We envision that further investiga- 
tion into genetic regulatory effects in specific 
cell types, study of additional tissues and de- 
velopmental time points not covered by GTEx, 
incorporation of a diverse set of molecular 
phenotypes, and continued investment in in- 
creasing sample sizes from diverse populations 
will continue to provide transformative scien- 
tific discoveries. 
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INTRODUCTION: Many complex human pheno- 
types, including diseases, exhibit sex-differentiated 
characteristics. These sex differences have been 
variously attributed to hormones, sex chromo- 
somes, genotype x sex effects, differences in 
behavior, and differences in environmental 
exposures; however, their mechanisms and 
underlying biology remain largely unknown. 
The Genotype-Tissue Expression (GTEx) proj- 
ect provides an opportunity to investigate the 
prevalence and genetic mechanisms of sex 
differences in the human transcriptome by 
surveying many tissues that have not previ- 
ously been characterized in this manner. 


RATIONALE: To characterize sex differences in 
the human transcriptome and its regulation, and 
to discover how sex and genetics interact to in- 
fluence complex traits and disease, we generated 
a catalog of sex differences in gene expression 
and its genetic regulation across 44 human tis- 
sue sources surveyed by the GTEx project (v8 
data release), analyzing 16,245 RNA-sequencing 
samples and genotypes of 838 adult individuals. 
We report sex differences in gene expression lev- 
els, tissue cell type composition, and cis expres- 
sion quantitative trait loci (cis-eQTLs). To assess 
their impact, we integrated these results with 
gene function, transcription factor binding an- 
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Sex affects gene expression and its genetic regulation across tissues. Sex effects on gene expression were 
measured in 44 GTEx human tissue sources and integrated with genotypes of 838 subjects. Sex-biased expression is 
present in numerous biological pathways and is associated to sex-differentiated transcriptional regulation. Sex-biased 
expression quantitative trait loci in cis (sex-biased eQTLs) are partially mediated by cellular abundances and reveal gene- 
trait associations. TT, AT, and AA are genotypes for a single-nucleotide polymorphism; TF, transcription factor. 
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notation, and genome-wide association study 
(GWAS) summary statistics of 87 GWASs. 


RESULTS: Sex effects on gene expression are 
ubiquitous (13,294 sex-biased genes across all tis- 
sues). However, these effects are small and large- 
ly tissue-specific. Genes with sex-differentiated 
expression are not primarily driven by tissue- 
specific gene expression and are involved in a di- 
verse set of biological functions, such as drug and 
hormone response, embryonic development and 
tissue morphogenesis, fertilization, sexual repro- 
duction and spermatogenesis, fat metabolism, 
cancer, and immune response. Whereas X-linked 
genes with higher expression in females suggest 
candidates for escape from X-chromosome in- 
activation, sex-biased expression of autosomal 
genes suggests hormone-related transcription 
factor regulation and a role for additional tran- 
scription factors, as well as sex-differentiated 
distribution of epigenetic marks, particularly 
histone H3 Lys” trimethylation (H3K27me3). 

Sex differences in the genetic regulation of 
gene expression are much less common (369 sex- 
biased eQTLs across all tissues) and are highly 
tissue-specific. We identified 58 gene-trait associ- 
ations driven by genetic regulation of gene ex- 
pression in a single sex. These include loci where 
sex-differentiated cell type abundances mediate 
genotype-phenotype associations, as well as loci 
where sex may play a more direct role in the 
underlying molecular mechanism of the asso- 
ciation. For example, we identified a female- 
specific eQTL in liver for the hexokinase HKDC1 
that influences glucose metabolism in pregnant 
females, which is subsequently reflected in the 
birth weight of the offspring. 


CONCLUSION: By integrating sex-aware analyses 
of GTEx data with gene function and transcription 
factor binding annotations, we describe tissue- 
specific and tissue-shared drivers and mechanisms 
contributing to sex differences in the human 
transcriptome and eQTLs. We discovered multi- 
ple sex-differentiated genetic effects on gene 
expression that colocalize with complex trait 
genetic associations, thereby facilitating the 
mechanistic interpretation of GWAS signals. 
Because the causative tissue is unknown for 
many phenotypes, analysis of the diverse GTEx 
tissue collection can serve as a powerful resource 
for investigations into the basis of sex-biased 
traits. This work provides an extensive char- 
acterization of sex differences in the human 
transcriptome and its genetic regulation. 
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Many complex human phenotypes exhibit sex-differentiated characteristics. However, the molecular 
mechanisms underlying these differences remain largely unknown. We generated a catalog of sex 
differences in gene expression and in the genetic regulation of gene expression across 44 human 
tissue sources surveyed by the Genotype-Tissue Expression project (GTEx, v8 release). We 
demonstrate that sex influences gene expression levels and cellular composition of tissue samples 
across the human body. A total of 37% of all genes exhibit sex-biased expression in at least one 
tissue. We identify cis expression quantitative trait loci (eQTLs) with sex-differentiated effects and 
characterize their cellular origin. By integrating sex-biased eQTLs with genome-wide association study 
data, we identify 58 gene-trait associations that are driven by genetic regulation of gene expression in 
a single sex. These findings provide an extensive characterization of sex differences in the human 


transcriptome and its genetic regulation. 


any complex human phenotypes, such 
as anthropometric traits (e.g., waist- 
to-hip ratio), exhibit sex-differentiated 
distributions; disease features such as 
prevalence, progression, age of onset, 
and response to treatment often differ by sex 
(1-5). These sex differences have been variously 
attributed to hormones, sex chromosomes, 
genotype x sex effects, differences in behavior, 
and differences in environmental exposures 
(6), but the mechanisms and underlying biol- 
ogy of the sex differences remain largely un- 
known. The Genotype-Tissue Expression (GTEx) 
project (7) provides an opportunity to investigate 
the prevalence and genetic mechanisms of sex 
differences in transcriptomes and to identify 


how sex and genetics interact to influence com- 
plex traits and disease. The analyses presented 
here characterize sex differences in a relatively 
large population sample, including many tis- 
sues that generally lack characterization. Be- 
cause the causative tissue is unknown for many 
diseases and disorders, analysis of this diverse 
tissue set can serve as a powerful resource for 
investigations into the basis of sex-differentiated 
phenotypes. 

We present an extensive characterization of 
sex differences in the human transcriptome 
across 44 tissue sources of the GTEx project 
[v8 data release (8)] from 838 individuals 
(557 males, 281 females), constituting a large 
collection of multi-tissue bulk gene expression 


and genotype data (Fig. 1) (9). We quantify and 
characterize sex differences in gene expression 
levels (sex-biased gene expression) and cis sex- 
biased expression quantitative trait loci (sb- 
eQTLs). By incorporating the results of these 
sex-aware analyses of GTEx data with gene 
features and transcription factor binding an- 
notation, we describe tissue-specific and tissue- 
nonspecific drivers and mechanisms contributing 
to sex differences in the human transcriptome 
and eQTLs. By integrating data from genome- 
wide association studies (GWASs), we report 
multiple sex-differentiated genetic effects on 
the transcriptome that colocalize with complex 
trait associations, highlighting the power of 
characterizing sex bias in GTEx samples for the 
mechanistic interpretation of GWAS signals. 


Sex effects on gene expression are ubiquitous 
but small 


Using GTEx v8 data (table S1), we quantified 
sex-biased gene expression in each of the 44 
tissue sources for all genes expressed in at least 
one tissue. We considered a total of 35,431 
X-linked and autosomal genes, including pro- 
tein coding, long intergenic noncoding RNA 
(iincRNA), and other less-characterized gene 
types such as transcribed pseudogenes (9). 
For each tissue, we first fit a linear model that 
accounts for known sample and donor char- 
acteristics, as well as surrogate variables that 
capture hidden technical or biological factors 
of expression variability, including tissue cell 
type composition (fig. S1, A to C). Consequently, 
we are able to identify sex-biased gene expres- 
sion that does not derive from sex differences in 
cell type abundances. We next modeled sex bias 
effects across tissues. We discovered a total of 
13,294 differentially expressed genes [sex-biased 
genes; local false sign rate (LFSR) < 0.05], with 
473 to 4558 genes discovered per tissue, rep- 
resenting 1.3% to 12.9% of all tested genes, 
respectively (Fig. 2A, fig. S1, D to F, and table $2). 
Previous studies have reported widespread 
sex-biased gene expression (J0-12) and de- 
scribed breast as the most sex-differentiated 
tissue (JO, 11, 13). However, we did not observe 
this in the present study after controlling for 
sex differences in tissue cell type composition 
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Fig. 1. Sample, data types, and discovery sets in the study of sex differences in GTEx v8. Tissue types 
(including 11 distinct brain regions and two cell lines) are illustrated, with sample numbers from GTEx v8 
genotyped donors (females:males, in parentheses) and color coding indicated for each. This study included 
N = 44 tissue sources present in both sexes with =70 samples. Tissue sources comprised two cell lines, 
40 tissues, and two additional replicates for brain cerebellum and cortex tissues. Tissue name abbreviations 
are shown in bold. See (9) for specific numbers of donors used in each analysis. 


(fig. S1A). We next assessed replication of sex- 
biased genes in independent gene expression 
datasets for four tissues (brain cerebellum, 
brain cortex, heart left ventricle, and lympho- 
cytes; table S2). We observed moderate to strong 
replication (average 1, = 0.62, average effect size 
Spearman’s p = 0.78). In total, 37.5% (13,294/ 
35,431) of the human transcriptome was differ- 
entially expressed in at least one tissue. Of these, 
531 genes (4%) were X-linked and 12,763 genes 
(96%) were autosomal, representing 47% and 
37% of all tested X-linked and autosomal genes, 
respectively. Although abundant, sex effects 
were mostly small (fig. S2A), particularly for 
autosomal genes (9) (fig. S2B). X-linked genes 
with higher expression in females (female-biased 
genes) exhibited larger sex effects [median fold 
change (FC) = 1.13] than either X-linked genes 
with higher expression in males (male-biased 
genes; median FC = 1.08) or autosomal sex- 
biased genes (median FC), and FC; = 1.04; fig. 
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S2B), potentially as a result of escape from X- 
chromosome inactivation (XCI) (74). The num- 
ber of sex-biased genes and the effect sizes were 
not dominated by either sex (fig. S2C). 


Sex-biased gene expression is largely 
tissue-specific 

Sex-biased genes exhibited a skewed pattern of 
tissue sharing; they were likely to be differen- 
tially expressed in only a small subset of tissues 
(Fig. 2B), as previously reported (J0-13). Of 
13,294 total sex-biased genes, 2416 (18.2%) 
were differentially expressed in only a single 
tissue (Fig. 2B), suggesting tissue-dependent 
regulation. Only 30 genes (0.23%), 22 of which 
are known constitutive XCI escapees (table S3), 
exhibited consistent sex bias across all 44 tis- 
sue sources (Fig. 2B). This tissue specificity 
did not simply reflect patterns of gene expres- 
sion across tissues; sex-biased genes tended 
to be ubiquitously expressed across tissues, 
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whereas sex-biased expression was limited to 
one or a few tissues (9) (Fig. 2C and fig. S2D). 
The majority (8241/10,878 genes, 76%) of genes 
with sex bias in two or more tissues exhibited 
consistent effect direction across tissues, espe- 
cially for X-linked genes (fig. S2E). Notably, 
whole blood and cell lines, the most widely 
studied biospecimen types, were not represent- 
ative of sex-biased expression across tissues; 
sex-biased genes in whole blood constituted 
only 12.9% (1710/13,294) of all sex-biased genes. 
Although hierarchical clustering of tissues 
based on gene expression and on sex-biased 
expression is highly concordant (cophenetic 
correlation coefficient = 0.75) (9) (Fig. 2C and 
fig. S3, A to C), the intersection between the 
cluster-defining gene sets (table S4) is less 
than expected by chance (P < 2.2 x 1077, hy- 
pergeometric test). For example, both gene 
expression and sex-biased expression supported 
a cluster of brain subregions that is clearly dif- 
ferentiated from other tissues (Fig. 2C and fig. 
83, B and D). However, the cluster based on sex- 
biased expression was driven by 194 genes, 
whereas the transcriptome-based brain cluster 
was driven by 982 genes, from which only six 
were common with those defining the sex-based 
brain cluster. Among drivers of the sex-based 
liver cluster, we identified CYP450 genes— 
CYPIA2, CYP3A7, CYP3A4—as previously reported 
(15), but we also found genes less well char- 
acterized for sex bias, such as PZP, H19, and 
VWCE, which were previously shown to be sex- 
differentially expressed as a result of liver- 
specific sex differences in DNA methylation 
(16). These results suggest that the tissue 
specificity of sex-biased expression is not driven 
primarily by tissue-specific gene expression. 


X-linked female-biased genes accurately 
predict sex and suggest tissue-specific 
candidates for escape from X-chromosome 
inactivation 


We accurately predicted sex from gene ex- 
pression, as previously explored (17), using X- 
linked genes (9) (fig. S4, A to D) with gradient 
boosted trees. Although the most predictive 
X-linked genes (fig. S4E) are those known to 
escape XCI, we identified 40 X-linked female- 
biased genes predictive of sex (within the top 
tertile with respect to their Shapley values) not 
previously described as XCI escapees (table S3). 
These results suggest further evaluation of these 
genes as potential XCI escapees; we did not 
directly test escape from XCI, and female-biased 
expression of X-linked genes may originate 
from other mechanisms. Sex prediction from 
autosomal genes was less accurate (mean ac- 
curacy = 84%), less specific (mean specificity = 
56%, sensitivity = 96%; fig. S4D), and required 
more genes (fig. S4F) than prediction based on 
X-linked genes. However, in two tissues—breast 
and muscle—autosomal genes predicted sex with 
specificity => 90% and sensitivity = 98% (fig. S4G). 
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Fig. 2. Sex-differential gene expression. (A) Number of sex-differentially expressed genes (sex-biased genes) per tissue. Tissue colors are as in Fig. 1. (B) Sex- 
biased gene discovery (histogram, number of sex-biased genes) and characteristics of sex-biased genes (stacked bar plots) as a function of tissue sharing. 
Proportions of X-linked and autosomal sex-biased genes (Chr.) and of female- and male-biased genes (Sign) are indicated. (C) Hierarchical clustering of tissues based 
on gene expression (left) and the effect size of sex-biased genes (right). See (9) for further details. 


Sex-biased genes exhibit nonrandom and 
tissue-specific genomic distribution 

Except for the enrichment of female-biased 
genes on the X chromosome, little is known 
about the genome-wide distribution of sex- 
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biased genes. We applied a positional gene 
enrichment analysis method (8) separately 
for male- and female-biased genes (LFSR < 
0.05) from each tissue (9) (fig. S5A). We dis- 
covered clustering of a total of 1559 sex-biased 
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genes in 134 autosomal and five X-linked re- 
gions (P < 0.001, hypergeometric test) (Fig. 3A 
and table S5). On the X chromosome, pseudo- 
autosomal region PARI and the remainder of 
the X-chromosome short arm p were enriched 
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Fig. 3. Regulatory mechanisms and biological functions of sex-biased 
genes. (A) Genomic position enrichment of sex-biased genes, as indicated by 
male-biased (blue) and female-biased (red) genes across all chromosomes (left) 
and chromosome X (right). The height of each rim represents the tissue sharing 
of the significant genomic enrichment signal and ranges from 1 to 44 (number of 
tissue sources). See (9) for further details. (B) Transcription factor binding site 
(TFBS) enrichment in promoter regions of sex-biased genes. Of 92 enriched 
TFBS profiles, the top 40 with the largest difference across all tissues in the 


for male-biased and female-biased genes, re- 
spectively (Fig. 3A, right), as previously re- 
ported (J4). Female-biased gene enrichment 
was stronger (Spearman’s p = 0.51, P = 1.63 x 
10°’) in the younger strata of arm p (fig. S5B), 
likely driven by escape from XCI (/4, 19). Al- 
though enriched X-chromosome regions spanned 
~126 Mb, only 25% of subregions were en- 
riched in at least two-thirds of the tissues. 
Among autosomal sex-biased genes, we ob- 
served a cluster of male-biased genes on chro- 
mosome 20 that was identified in 70% (30/44) 
of tissues (fig. S5C), but the majority of the 
134 autosomal enriched regions were tissue- 
specific, identified on average in ~7% (3/44) 
of tissues (fig. S5D and table S5). These results 
are compatible with tissue-variable escape from 
XCI (4, 20) and with tissue-specific topolog- 
ically associating domains, possibly mediated 
by hormones (27). Further investigation is war- 
ranted to corroborate these and other hy- 
potheses, as observed patterns may originate 
from a variety of mechanisms. 


Promoters of sex-biased genes are enriched 
for hormone-related and other transcription 
factor binding sites 


We hypothesized that transcription factor (TF) 
activity might drive observed patterns of dif- 
ferential expression, because sex-biased gene 
regulation by TFs has recently been reported 
(13) and TFs contribute to evolutionary changes 
in sex bias (12). We tested for enrichment of TF 
binding sites (TFBSs) of 231 TFs previously 
identified through chromatin immunoprecipi- 
tation sequencing (22) in promoter regions 
(i.e., 2 kb upstream of the transcription start 
site) of male- and female-biased genes (9) (fig. 
S5E). We discovered enrichment for TFBSs of 
a total of 92 TFs (fig. S5F), two of which were 
X-linked (AR, ELK1). TFBSs for 54 TFs were 
enriched among female-biased genes and 
60 TFs among male-biased genes, with 22 TFs 
enriched among both sets of genes (table S6). 
The 92 TFs include (i) known hormone-related 
TFs estrogen (ESRI), androgen (AR), and glu- 
cocorticoid (NR3CT) receptors, (ii) 10 TFs that 
colocalize with steroid receptors, and (iii) TFs 
with a nonreported or less-characterized hor- 
mone association, including SPI, E2F6, NRFI, 
KLF9, and SP2, the top five TFs with consistent 
TFBS enrichment across tissues (9). 

The strongest difference between male- and 
female-biased enrichment profiles was observed 
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for TFBSs of SP2, SP4, NFYB, TWISTI, and 
STAT®5B (female-biased) and of HNF4G, NFKBI, 
F2F6, HNF4A, and ETSI (male-biased), respec- 
tively, which were detected across most tissues 
(Fig. 3B and table S6). In contrast, we observed 
tissue specificity for enrichment of TFBSs of 
several TFs, such as RFX2 and ETV4 for brain 
and breast tissues, respectively (Fig. 3B and 
fig. SSF). Although STAT5B and HNF4A play 
known roles in sex differences in body growth 
rates and liver gene expression (15), less is 
known about their roles and sex biases across 
all tissues. The effect of sex on most of the 
remaining TFs is uncharacterized. Together, 
these results suggest that hormone-related TFs 
regulate sex-biased expression as expected, but 
they also indicate that additional TFs play a role 
in sex-biased gene expression, in some cases in a 
tissue-specific manner (table S6). Notably, TFBS 
enrichment is not driven by sex-biased expres- 
sion of the TFs themselves (9), consistent with 
the observation that sex-biased TF targeting of 
genes is independent of sex-biased gene expres- 
sion (73). However, this scenario cannot be dis- 
carded if such differences occur at an earlier 
developmental time point and translate into a 
more constitutive sex-biased TF binding profile 
(23). Alternatively, other mechanisms involving 
TFs could be causal drivers [e.g., posttransla- 
tional modifications as reported in mice (24)]. 


Sex-biased genes are involved in a highly 
diverse set of biological functions and suggest 
sex-specific deposition of epigenetic marks 


To gain insight into cellular functions affected 
by sex-biased genes, we performed gene set 
enrichment analysis (GSEA) in each tissue, 
considering the direction of the sex effect (9) 
(fig. S6A and tables S7 and S8). To identify 
gene sets that are enriched across multiple 
tissues, we performed a meta-analysis using 
Fisher’s combined probability test and iden- 
tified 2134 enriched gene sets [false discov- 
ery rate (FDR) < 0.05; table S9]. We applied a 
community detection approach to identify 
common features across enriched gene sets 
and defined 36 clusters (table S9). Among the 
top-scoring clusters (9), we identified enrich- 
ment of genes in pathways involved in drug 
and hormone response, epigenetic marks, em- 
bryonic development and tissue morphogenesis, 
fertilization, sexual reproduction and spermato- 
genesis, fat metabolism, cancer, immune re- 
sponse, and other functions (Fig. 3C and table 
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enrichment profile derived from male-biased and female-biased genes are 
displayed. Values represent the TFBS enrichment ranking transformed to [0, 1] 
per tissue and per sex; a value of 1 corresponds to the highest enrichment. 
See (9) for further details. (©) Clusters (gray circles) of gene sets enriched for 
genes highly expressed (blue and red balloons) in females (red) or males 
(blue) across tissues. Balloon size corresponds to the P value for the across-tissue 
meta-analysis of GSEA. Faint lines connecting balloons correspond to shared 
leading-edge genes between gene sets. See (9) for further details. 


89). The top-scoring cluster corresponds to 
targets of polycomb repressive complex 2 (PRC2) 
and trimethylation of histone H3 at Lys”’ 
(H3K27me3), which is predominantly driven by 
female-biased genes—a pattern also reported 
for other epigenetic modifications (13). This 
complex induces gene silencing and is involved 
in XCI (25). Sex-specific deposition of H3K27me3 
marks has been previously reported, result- 
ing in sex-biased gene expression in mamma- 
lian placenta (26) and adult liver (27). These 
differences have been hypothesized to be reg- 
ulated by sex differences in the secretion of 
placental glycosyltransferase OGT and pitui- 
tary growth hormone. The observed associa- 
tion of H3K27me3 with sex-biased expression 
in the tissues of this study (table S9) has not 
been previously reported. We also identified 
clusters related to drug metabolism that in- 
clude CYP450 genes. Sex-biased expression 
of CYP450 has been reported in liver (75) and 
linked to sex-differentiated growth hormone 
profiles; we observed sex-biased expression in 
additional tissues (fig. S6B). Sex-biased expres- 
sion was also identified for clusters related to 
gonad tissue functions (e.g., meiotic synapsis), 
which comprise genes expressed largely in 
testis (fig. S6B). It is possible that some of the 
cross-tissue sex-biased expression patterns ob- 
served in adult tissues are derived from gamete 
formation and embryogenesis (28). Together, 
these results indicate that sex-biased genes are 
involved in a wide range of biological functions 
and pathways, many of which have not been 
previously associated with sex differences. 


Sex and disease influence tissue cellular 
composition 


The GTEx tissue samples are mixtures of het- 
erogeneous cell types, with variation among 
individuals and tissues (29). In whole blood, 
cell type composition differs between sexes 
(30, 31), but little is known about sex dif- 
ferences in composition of other tissues. Using 
attest, we examined each GTEx tissue for sex 
differences in cellular composition on the basis 
of estimated abundances of seven cell types 
(9, 29). We discovered significant (FDR < 0.05) 
differences for four cell types—keratinocytes, 
neutrophils, adipocytes, and epithelial cells— 
in three tissues (fig. S7A and table S10). We 
hypothesize that additional cell types unchar- 
acterized in this study may influence the cell 
type composition of GTEx tissues, particularly 
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of immune cells, because marked sex differ- 
ences in immune cell abundances have been 
reported (30, 32). To investigate cellular abun- 
dances in disease, we used histological anno- 
tations from pathology review of GTEx tissue 
samples (9). We discovered six pathological 
phenotypes with altered cell type composition 
(fig. $7, B to E, and table S10). Together, these 
results suggest that sex is correlated with tis- 
sue cellular composition, and that disease may 
alter cellular abundances in a sex-differentiated 
manner or in sex-specific pathologies. 


Sex differences in the genetic regulation of gene 
expression are highly tissue-specific and less 
common than sex effects on gene expression 


Sex-differentiated human phenotypes and dis- 
ease characteristics may derive in part from 
sex-differentiated genetic effects (6, 33-36), 
some of which may have an impact on gene 
expression. For each of 491,694 conditionally 
independent cis-eQTLs identified in the sex- 
combined cis-eQTL analysis of the GTEx v8 
project (8), we performed sex-biased cis-eQTL 
(sb-eQTL) analysis in each of 44 tissues pres- 
ent in both sexes (Fig. 1). We used a linear 
regression model including genotype, sex, 
and covariates, and tested for significance of 
a genotype x sex (GxSex) interaction on ex- 
pression (9). Notably, this approach captures 
GxSex interactions that derive both from sex 
and from sex-correlated factors, including cell 
type abundances or environmental factors. Al- 
though the contribution of cell type heteroge- 
neity to sb-eQTLs is currently unknown, we 
observed sex differences in tissue cell type com- 
position (fig. S7A), which may affect sb-eQTL 
discovery. Hence, we characterized the impact 
of cell type-specific eQTLs on sb-eQTLs (see 
below). We discovered a total of 369 sb-eQTLs, 
corresponding to 366 genes (sb-eGenes) (FDR < 
0.25; table S11). The majority of sb-eQTLs 
were identified in breast tissue (261 sb-eQTLs), 
but also in muscle (36 sb-eQTLs), skin (18 sb- 
eQTLs), and adipose tissues (14 sb-eQTLs) (Fig. 
4A and fig. S8, A and B). Overall, sb-eQTLs 
showed strong evidence for tissue specificity 
(9); only one sb-eQTL was significant in two 
tissues (table S11), and only 21% displayed pat- 
terns suggestive of tissue-sharing even at a 
lenient significance threshold (Pgxgex < 0.01). 
Only 36 sb-eGenes (14%) exhibited sex-biased 
expression in the discovery tissue [multivar- 
iate adaptive shrinkage (MASH) LFSR < 0.05; 
table S12], similar to recent observations (37). 
This is compatible with small sb-eQTL effects 
not translating into significant sex-biased gene 
expression, or with different functional mech- 
anisms contributing to each sex bias type. 

To provide additional support for the sb- 
eQTLs, we used two approaches to assess dif- 
ferential allele-specific expression (ASE) between 
sexes: allelic fold change (ASE aFC) (38) and 
environment ASE through generalized linear 
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modeling (EAGLE) (9, 39). Allele-specific ex- 
pression can result from cis-regulatory genetic 
effects in heterozygous individuals. Differen- 
tial ASE therefore indicates condition-specific 
cis effects (39), including sex specificity. We 
observed that both approaches, despite lim- 
ited power when restricted to heterozygous 
individuals and differences in methodology, 
indicate that a portion of the detected sb- 
eQTLs correspond to sex differences in ASE 
(fig. S8C): sb-eQTLs were enriched for sex- 
biased ASE aFC (all tissues, 2, = 0.36; breast, 
mT = 0.41; fig. S8, D and E) and for EAGLE 
associations (m = 0.13, empirical test, P < 
0.001). Of the 243 and 163 sb-eQTLs tested by 
ASE aFC and EAGLE methods, respectively, 65 
(26.7%) were supported by ASE aFC (Wilcoxon 
P < 0.05) (fig. S7, F and G), 29 (17.8%) were 
supported by significant EAGLE associations, 
and 16 sb-eQTLs (10.4% of the 154 sb-eQTLs 
tested by both methods) were supported by 
both methods (table S11). 

We were limited in our ability to replicate 
sb-eQTLs because the majority of sb-eQTLs 
were discovered in breast tissue, and matching 
well-powered datasets do not exist. We per- 
formed internal validation, splitting GTEx 
breast samples into discovery and validation 
cohorts, and observed moderate replication 
(mean 7m, = 0.28) (9) (fig. SSH). We next as- 
sessed sb-eQTL replication (considering sb- 
eQTLs from breast, whole blood, and all tissues) 
in independent larger (~900 subjects) whole- 
blood eQTL datasets, including DGN (40) and 
GAIT2 (41) cohorts (9) (table S13). We observed 
weak replication (m, = 0 to 0.12, depending 
on sb-eQTL set and replication cohort). Poor 
replication of sb-eQTLs has been reported 
(40, 42, 43) and has been, in part, attributed 
to low power (44) but also to methodological 
and study design differences. 

For each sb-eGene, we also performed sex- 
stratified cis-eQTL analysis for each tissue, 
downsampling males to match the female 
sample size (9). We observed strong corre- 
lation (Spearman’s rank correlation p = 0.78, 
P< 2.2 x 10°'°) between male and female cis- 
eQTL effect sizes. For 58% of sb-eQTLs, 
sex-stratified cis-eQTL analysis revealed asso- 
ciations in both sexes with concordant allelic 
effect but different effect sizes. For example, 
rs117380715-ADRAIA in adipose subcutaneous 
tissue showed a stronger effect in females than 
in males (By = -0.78, Pp = 4.64 x 107", By = 
-0.47, Pyy = 3.98 x 107°) (Fig. 4B and fig. S8I). 
For the remainder of the sb-eQTLs, a cis-eQTL 
was detected exclusively in either females 
(70, 19%) or males (84, 23%). For example, 
we identified a female-specific cis-eQTL for 
rs894.2-C4BPB in breast (Br = 0.40, Pp = 2.68 x 
1077, By = -0.02, Py: = 0.89) (Fig. 4B and fig. 
S81). C4BPB encodes the beta unit of the C4b- 
binding protein and controls activation of the 
complement cascade (45). We also identified 
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a male-specific cis-eQTL for rs2273535-AURKA 
in skeletal muscle (By, = 0.47, Br = 0.01), de- 
scribed in (8). AURKA, encoding Aurora kinase 
A, is a member of the serine/threonine kinase 
family involved in mitotic chromosomal segre- 
gation and muscle differentiation (46) and is a 
known risk factor for several cancers (47). These 
results demonstrate that sex-biased genetic ef- 
fects on gene expression exist for a small pro- 
portion of previously identified cis-eQTLs, and 
that some sb-eQTLs affect genes implicated in 
human phenotypes. 


Sex differences in genetic regulation of gene 
expression are partially mediated by cell 
type-specific eQTLs 

Given that the GxSex interaction term of our 
eQTL model captures interactions that derive 
from sex as well as interactions with sex- 
correlated factors, we next characterized the 
fraction of sex-biased eQTLs that are driven 
by cell type-specific eQTLs (fig. S9A). We 
focused on breast, the tissue with the most 
sb-eQTLs and the largest sex differences in 
cellular composition (figs. S7A and S8B). 
We tested 261 breast sb-eQTLs for enrichment 
of cell type interacting cis-eQTLs (ieQTLs) 
(9, 29). These ieQTLs correspond to cis-eQTLs 
where the effect varies depending on estimated 
cell type abundances (29). Breast sb-eQTLs 
were strongly enriched (x, = 0.66 and 0.89) 
for ieQTL signal corresponding to adipocytes 
and epithelial cells (fig. S9B). After including 
an interaction term for genotype x epithelial 
cell abundance estimates in the sb-eQTL model, 
58% of breast sb-eQTLs (152/261) remained 
significant, whereas for 42% of sb-eQTLs (109/ 
261), the genotype x sex effect was strongly 
attenuated (fig. S9C and table S14). For exam- 
ple, the strongest breast sb-eQTL, rs2289149- 
LINCO00920 (P = 4.83 x 10™), was not significant 
after incorporating the genotype x epithelial 
cell abundance estimates in the model (Bexsex = 
0.187, 95% confidence interval = [-0.004, 0.378]; 
fig. S9C and table S14). 

To formally test the impact of cell type com- 
position on sb-eQTL detection, we performed 
a mediation analysis, using genotype inter- 
actions with estimated epithelial cell abun- 
dance as a potential mediator (9) (fig. S9D). 
We discovered that 60 sb-eQTLs (23%) were 
mediated by cell type abundances (average 
causal mediation effects P < 0.001) (Fig. 4C 
and table S14). Mediation by other cell types 
cannot be excluded, particularly by immune 
cells: We observed that breast sb-eGenes are 
enriched for immunoglobulin variable chain 
genes (Fisher’s exact test, odds ratio = 12, P = 
9.2 x 10°*). In all cases, the eQTL effect size is 
larger in females (table S11). Because immu- 
noglobulin genes are mainly expressed in B 
cells and are among the most sex-discrimina- 
tive genes in breast (fig. S7D), we hypothesize 
that immunoglobulin sb-eQTLs may be driven 
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Bu = -0.47, Py = 3.98 x 107°, Pexsex = 1.05 x 10°) and 
C4BPB locus in breast mammary tissue (lower panels; 
Be = 0.40, Pr = 2.68 x 10°”, By = -0.02, Py = 0.89, 
Poxsex = 7.22 x 10°). Linkage disequilibrium between 
loci is quantified by squared Pearson coefficient of 
correlation (r). Diamond-shaped point represents the 
top significant eQTL variant across sex-stratified P values. 
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by greater abundances of this cell type in 
female breasts. Collectively, these results in- 
dicate that a large proportion of sb-eQTLs in 
breast are driven by cell type-specific genetic 
effects on gene expression that become appar- 
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ent when cell types differ between sexes, 
although our analysis cannot distinguish 
whether the tested cell types or others cor- 
related with them (fig. S9E) are the true me- 
diators of the signal. 
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Sex-aware eQTL-GWAS colocalization 

provides insights into the genetic basis of 
complex traits 

To assess whether sb-eQTLs are useful as 
a means of dissecting the molecular basis of 


7 of 13 


RESEARCH | GENETIC VARIATION 


complex trait associations, we performed co- 
localization (48) between sex-stratified cis-eQTLs 
and 87 GWASs, representing 74 distinct com- 
plex traits, for 1089 sb-eGenes at a more relaxed 
FDR (<0.50) (9). We identified 74 colocalized 
gene-trait pairs [posterior probability of shar- 
ing the same causal variant (PP4) > 0.5; Fig. 5, 
A to C]. Of these, 58 were colocalized (PP4 > 0.5) 
in one sex but not in the other—36 for females 
and 22 for males—corresponding to 36 unique 
genetic loci and 27 distinct traits (Fig. 5, A to 
C, and table S15). For 24/36 (67%) female- 
stratified and 10/22 (45%) male-stratified 
cis-eQTL-trait pairs, evidence for colocaliza- 
tion was also found using the male and fe- 
male combined GTEx v8 cis-eQTLs (fig. SIOA). 
For these 34 loci that colocalized in the sex- 
combined approach, we found evidence that 
the colocalization signal is driven by regula- 
tory effects in a single sex. The remaining 12/ 
36 (33%) female and 12/22 (55%) male gene- 
trait colocalizations were not discovered with 
the sex-combined approach. 

The strongest colocalizations between a trait 
and a female-stratified cis-eQTL were identi- 
fied for CCDC88C and breast cancer, and for 
AKDC1 and birth weight (Fig. 5, C and D). Con- 
versely, the strongest colocalizations between a 
trait and a male-stratified cis-eQTL were identi- 
fied for DPYSL4 and percentage of body fat, and 
for CLDN7 and birth weight (Fig. 5, C and E). 
CCDC88C is a negative regulator of the Wnt 
signaling pathway, a key mechanism in cancer 
progression (49), and the CCDC88C female cis- 
eQTL signal in breast colocalizes with risk of 
breast cancer (Fig. 5D, left), a trait with highly 
sex-differentiated incidence and presentation 
(50). For breast cancer, we identified two addi- 
tional female-driven (PP4; > PP4) colocal- 
ized sb-eGenes, NTN4 and CRLF3 (table S15), 
previously reported as breast cancer-relevant 
genes (51, 52). 

We also discovered a preferential colocali- 
zation of blood and immune traits with female- 
stratified relative to male-stratified cis-eQTLs 
(odds ratio = 2.22; P = 0.0477, Fisher’s exact test). 
This includes inflammatory bowel diseases, 
which show a higher prevalence in females with 
increasing age (53), and immune cell abun- 
dances in blood, which also exhibit sex differ- 
ences (30, 31). Together, these results suggest 
that sex-biased genetic regulation of gene ex- 
pression may contribute to the etiology of dis- 
eases with marked sex differences. 

Moreover, we identified colocalization signal 
for eQTLs and GWAS of sex-specific traits as 
well as signal possibly derived from sex-specific 
conditions, such as pregnancy in females and 
balding patterns in males. The C9orf66 male- 
stratified cis-eQTL signal in breast colocalized 
with balding patterns in males, and the HKDCI 
female-stratified cis-eQTL signal in liver colo- 
calized with birth weight, which is strongly 
influenced by maternal factors (Fig. 5D, right) 
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(54). The sb-eQTL for this locus in liver was 
replicated in an independent dataset (55) 
(1s35696875-HKDCI Py; = 2.73 x 107°, Py = 
1.60 x 10%, z-test P = 0.004; fig. SIOB). HKDCI 
encodes a member of the hexokinase protein 
family and is involved in glucose metabolism. 
Multiple variants in perfect or high linkage 
disequilibrium with rs35696875 that cause 
reduced expression of HKDCI have been asso- 
ciated with gestational diabetes mellitus risk 
(56) and glycemic traits during pregnancy (54). 
Here, we confirmed that the HKDCI female 
eQTL signal in the liver colocalizes with ma- 
ternal glucose levels in plasma during pregnancy 
(PP4 = 0.92; fig. S1OC). Recently, regulatory 
variants spanning multiple enhancers were 
found to have a coordinated allelic effect on 
HKDCI expression in hepatocyte-derived cells 
(57). Estimates of hepatocyte abundance in 
GTEx liver samples did not differ by sex (P = 
0.30), and the rs35696875-HKDCI sb-eQTL 
showed no evidence of being a hepatocyte 
ieQTL (Pextepatocytes = 0-11) (29). Thus, unlike 
many sb-eQTLs in breast, the HKDCI sb-eQTL 
in liver did not seem to be driven by sex- 
differentiated cell type abundances. The HKDCI 
sb-eQTL alternative allele is associated with 
lower HKDCI expression, higher maternal glu- 
cose levels, and increased birth weight. These 
results suggest that the HKDCI female cis- 
eQTL influences glucose metabolism in the 
pregnant female, which is reflected in the birth 
weight of the offspring. Further investigation is 
needed, however, to prove causality. 

Additionally, the DPYSL4 male-stratified 
cis-eQTL signal in skeletal muscle colocalized 
with genetic signal associated with percentage 
of body fat (Fig. 5E, right). DPYSL4 is linked to 
the pathophysiology of obesity and cancer: 
p53-inducible DPYSL4 associates with mito- 
chondrial supercomplexes and regulates energy 
metabolism in adipocytes and cancer cells. Low 
DPYSL4 expression is associated with poor 
survival of breast cancer patients (58). Of note, 
although the colocalizing signal was detected 
with the male-stratified cis-eQTL signal, the 
low probability of colocalization appears to be 
due to the presence of an additional cis-eQTL 
in females that is absent in males. These results 
suggest that characterizing sex differences in 
the genetic associations of complex traits and 
molecular phenotypes can prove useful to dis- 
sect allelic heterogeneity. 

Five colocalized sb-eGenes (CLDN7, CCDC125, 
FAM53B, PLEC, and SOWAHC), corresponding 
to cell type interaction cis-eQTL (cell type 
ieQTL) signals, also colocalized with reported 
GWAS signals (birth weight, blood cell counts, 
height, platelet counts, and schizophrenia, re- 
spectively) (29). For instance, the male-biased 
cis-eQTL rs34958987-CLDN7 in breast (Fig. 5E, 
left, and fig. SIOD) was identified as an epi- 
thelial cell ieQTL in breast (29). Both the sb- 
eQTL and cell type ieQTL signals colocalized 
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with the birth weight GWAS signal (fig. SIOE). 
This suggests that the origin of these sex dif- 
ferences in gene-trait associations may be in 
sex-differentiated cell type abundances. 

Finally, to assess whether sex-biased eQTL 
signals are reflected in sex-biased GWAS ef- 
fects, we obtained sex-stratified GWAS data 
for 36 of the 58 colocalized gene-trait pairs (9) 
(table S15). We identified two of 36 loci with 
sex differences in GWAS effect size (FDR < 
0.05, Bonferroni correction). These two sig- 
nals correspond to RNASET2 and CELSR2 
genes, which are more strongly associated to 
hyperthyroidism in females and to heart at- 
tack in males, respectively. However, with the 
current GWAS sample sizes, we observed that, 
in general, sex-biased effects at the eQTL level 
do not readily translate into sex-biased effects 
at the GWAS level, in line with recent power 
calculations where millions of GWAS samples 
were estimated to be needed to address this 
question (37). 

Overall, our colocalization results identified 
loci where sex-differentiated cell type abun- 
dances mediate genotype-phenotype associa- 
tions, and also loci where sex may play a more 
direct role in the underlying molecular mech- 
anism of the association, as in the HKDC7 locus. 
For future studies, accounting for context or 
environment (sex in the present study) in co- 
localization approaches is a promising approach 
to the discovery of gene-trait associations and 
their underlying origins. 


Discussion 


We identified widespread sex-biased gene ex- 
pression in all tissues, with 37% of genes ex- 
hibiting sex bias in at least one tissue, but 
with overall small (median FC = 1.04) sex ef- 
fects. These results derive from overlapping 
male and female distributions of interindividual 
expression variation, indicative of differential 
expression as opposed to completely dimorphic 
expression. These genes represent diverse mo- 
lecular and biological functions, and they in- 
clude genes relevant to disease and clinical 
phenotypes. As expected, the strongest sex 
bias was observed for X-chromosome genes, 
whereas the vast majority of sex-biased genes 
were autosomal, which suggests the influence 
of sex on genome-wide regulatory programs. 
As reported in (59) but not well characterized 
to date, we discovered that a portion of these 
genes were nonrandomly distributed across 
the genome, suggesting sex differences in re- 
gional regulation. Integration of these results 
with sex-aware analysis of epigenetic and chro- 
mosome conformation capture Hi-C data 
may provide mechanistic insights into these 
patterns. 

Although we identified a set of X-linked 
genes with sex-biased expression across many 
tissues, the overall sharing of sex-biased ex- 
pression among tissues was strongly skewed 


8 of 13 


SPECIAL SECTION GENETIC VARIATION 


A 1.00 


ie Adipose — Subcutaneous (6) 


Artery — Aorta (1) 

Brain — Cerebellar Hemisphere (1) 
Breast - Mammary Tissue (49) 

Liver (3) 

Muscle — Skeletal (13) 

Skin — Not Sun-Exposed (Suprapubic) (1) 


PP4 Males 
oO 
3 
eeeee@ 


GWAS colocalization 
with female and male eQTLs 


0.00 0.25 0.50 0.75 1.00 


PP4 Females 
Balding Pattern 4 UKB 
Cc Birth Weight 
Birth Weight UKB 
Body Fat Pi - UKE is 
fo} at Percentage 
: Breas! Cancer QO 06 
Chronotype UKB Q 07 
pena Artery Disease O 08 
osinophil Count O 09 
ER-positive Breast Cancer 
Fluid Intelligence Score UKB 
ranulocyte Count 
Hayfever UKB max PP4 
Heart te @ Female 
@ Male 


High Cholesterol UKBS 

High Light Scatter Reticulocyte Count 
Hypertension UKBS 
Hypothyroidism UKBS 
Inflammatory Bowel Disease 
Lymphocyte Count 

Myeloid White Cell Count 
Neutrophil Count 

Platelet Count 

Red Blood Cell Count 
Reticulocyte Count 
Schizophrenia 

Standing Height UKB 

Sum Basophil Neutrophil Count 
Sum Eosinophil Basophil Count 
Sum Neutrophil Eosinophil Count 
White Blood Cell Count 


131285820 2 i ° 
Breast Cancer ae . Body Fat (%) Ra%s1087026 : 


ee 
10870269 


-logio(p-value) 


2 
10) 
Male 
6 7 in BREAST 


4 |(PP4 = 0.31) 


. ae ae f 92 ue PICe ce Pah ae ey Se” f 
a j See 
HoH +4 CCDC88C “4 DPYSL4 
91.2 91.3 91.4 91.5 69.1 69.2 69.3 69.4 7A 7.2 7.3 74 132.0 132.1 132.2 132.3 
Position on chr14 (Mb) Position on chr10 (Mb) Position on chr17 (Mb) Position on chr10 (Mb) 


Oliva et al., Science 369, eaba3066 (2020) 11 September 2020 9 of 13 


RESEARCH | GENETIC VARIATION 


Fig. 5. Colocalization of sb-eQTLs with GWAS traits. (A) Posterior probability 
(PP4) of 74 colocalized gene-trait pairs where a GWAS shows evidence of 
colocalization with the female-stratified and/or male-stratified cis-eQTL signal 
(PP4 > 0.5). Numbers of colocalizing loci per tissue are shown in parentheses. 
(B) Numbers of colocalizing loci for female and male cis-eQTLs. (©) GWAS-eQTL 
colocalizing genes (PP4 > 0.5) color-labeled by eQTL tissue of origin according 
to labels in (A) (x axis) are categorized by the sex where the colocalization 
signal is maximized with the corresponding GWAS trait (y axis). Comparing the 
colocalization PP4 values for male and female cis-eQTL signals, the estimates 
can be maximum in females (red) or males (blue). (D) Genotype-phenotype 
association P values of the CCDC88C (left) and HKDCI1 (right) loci. For the 
CCDC88C locus, panels illustrate GWAS signal for breast cancer (top) and 


toward tissue specificity, with 18.2% of sex- 
biased genes discovered in only a single tis- 
sue. The high tissue specificity of sex-biased 
gene expression and the enrichment of TFBSs 
in sex-biased gene promoters implicate specific 
TFs in mediating sex-biased expression. Func- 
tional experiments to assess sex-differentiated 
TF binding are needed to evaluate the role of 
TF function in observed patterns. 

In contrast to the large impact of sex on 
gene expression levels, the overall extent of sex 
effects on genetic regulation in cis is much less 
(369 sb-eQTLs). This observation is consistent 
with an overall weaker role of sex in genetic 
regulation but is also affected by differences in 
power of the two analyses (60). For sb-eQTLs, 
the combination of small genotype x sex inter- 
action effect sizes, high interindividual expres- 
sion heterogeneity, and the sex imbalance in 
the GTEx collection affects the power of the 
interaction test. This implies that much larger 
cohorts are needed to fully characterize this 
phenomenon, particularly to assess sex ef- 
fects for all cis variants and genes. The rel- 
atively modest number of GxSex interactions 
for a factor as impactful as sex suggests that 
other, more subtle genotype-interacting envi- 
ronmental factors are likely to be challenging 
to identify [as noted in (39)]. The sb-eQTL 
analysis is also affected by cell type heteroge- 
neity within tissues. We demonstrated that a 
portion of sb-eQTLs are mediated by cell type 
composition, which suggests that a portion of 
the sb-eQTL signal may derive from the com- 
bination of cell type-specific eQTLs and sex 
differences in the tissue’s cell type compo- 
sition. The remaining loci for which we had no 
evidence of cell type mediation may represent 
true sex differences in genetic regulation of 
these genes, but might also derive from un- 
known factors confounded with sex, including 
cell types that were not part of our analysis. 
Thus, the full impact of cell type differences 
across tissues remains to be determined. 

The identification of sb-eQTLs that are un- 
equivocally not derived from sex differences 
in cell type abundances cannot be assessed 
with analysis of sb-eQTLs in bulk tissue. We 
anticipate that single-cell sb-eQTL analysis will 
help to disentangle sex effects on the genetics 
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of gene expression that derive from sex differ- 
ences in tissue composition versus those that 
derive from sex chromosome status. How- 
ever, this approach also has limitations due 
to the removal of cells from the in situ tissue 
environment—including, for example, the pres- 
ence of other cell types and diverse hormonal 
environments. 

In efforts to understand the molecular 
basis of sex differences in disease and other 
phenotypes, it is important to note that the 
connection between the molecular changes 
observed here and complex phenotypes is 
likely to be complicated by many compen- 
satory and buffering effects (61). Despite ex- 
tensive sex differences at the transcriptome 
level, the majority of biology at all phenotype 
levels is shared between males and females. 
Furthermore, the sex differences observed 
here are based on a snapshot of mostly older 
individuals. Sex differences that occur dur- 
ing different developmental stages, in specific 
environments, or in specific disease states are 
not well represented in our analysis. For ex- 
ample, sex biases are observed in many can- 
cers (7). Our results provide a resource of sex 
effects in “nondiseased” tissues to compare 
with those of disease cohorts. We note that 
sex is highly correlated with many features 
of behavior and external environments [e.g., 
smoking (62)], and disentangling sex differ- 
ences driven by inherent biology versus gen- 
dered environments is an important further 
challenge. 

Beyond gene expression, sex-biased genetic 
regulation may also contribute to higher-order 
phenotypes such as complex traits and dis- 
eases; colocalization analysis of sex-stratified 
cis-eQTLs and sex-combined GWAS summary 
statistics yielded variant-gene-trait associations 
that were not detected in combined-sex cis- 
eQTL colocalization analysis. In general, context- 
aware colocalization analyses may help to 
elucidate the origin of gene-trait associations, 
as hypothesized here for HKDCI’s impact on 
birth weight through alteration of glucose 
metabolism in a pregnant female’s liver. We 
show that sex-biased gene-trait associations 
are likely attributable to either allelic heter- 
ogeneity in the combined-sex cohort or genetic 
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CCDC8S8C cis-eQTL signal for females (middle) and males (bottom) in breast 
mammary tissue. For the HKDCI locus, panels illustrate GWAS signal for birth 
weight (top) and HKDCI cis-eQTL signal for females (middle) and males 
(bottom) in liver. (E) Genotype-phenotype association P values of the CLDN7 
(left) and DPYSL4 (right) loci. For the CLDN7 locus, panels illustrate GWAS signal 
for birth weight (top) and CLDN7 cis-eQTL signal for females (middle) and males 
(bottom) in breast mammary tissue. For the DPYSL4 locus, panels illustrate 
GWAS signal for body fat (top) and DPYSL4 cis-eQTL signal for females (middle) 
and males (bottom) in muscle skeletal tissue. In (D) and (E), linkage 
disequilibrium between loci is quantified by squared Pearson coefficient of 
correlation (r°). Diamond-shaped point represents the top significant cis-eQTL 
variant across sex-stratified P values. 


effects on gene expression that are (predom- 
inantly) driven by a single sex; colocalized sb- 
eGenes cannot be considered as proxies of loci 
harboring sex differences in the genetic ar- 
chitecture of the linked trait. Because sex-aware 
colocalizations can provide insights into the sex- 
differentiated genetic architecture of disease, we 
expect future work in this area combining sex- 
stratified cis-eQTLs with summary statistics 
from sex-stratified GWASs to enable us to 
fully comprehend the impact of sex on human 
health and disease. The extension of analytical 
approaches to facilitate widespread genetic 
analysis of sex chromosomes is an important 
step toward these new research directions. 


Methods summary 


Sex-differential expression was performed with 
voom-limma (63) and MASH (64) (fig. SIA). Sex- 
differential effect sizes and gene expression 
levels were investigated for tissue specificity 
with the Tau index (65), clustered with pvclust 
(66), and compared with dendextend (67) 
(fig. S3A). Sex predictivity of sex-biased genes 
per tissue was quantified through gradient- 
boosted tree classifier models (68) (fig. S4A). 
Positional gene enrichment analysis of sex- 
biased genes was performed with PGE (18) 
(fig. S5A). Transcription factor binding site 
enrichment in promoter regions of sex-biased 
genes was performed with Unibind (22) and 
runLOLA (69) (fig. SSE). Gene set enrichment 
analysis was performed with fgsea (70) (fig. 
S6A) and results characterized with Cyto- 
scape (71). Sex differences in cell type abun- 
dances and their effect on histopathological 
phenotypes were explored using linear re- 
gression. sb-eQTL mapping was implemented 
using an adaptation of FastQTL (72) (fig. S8A); 
sb-eQTLs were validated using haplotype- 
level allelic expression data generated with 
phASER and allele-specific expression mod- 
eling using EAGLE. Characterization of sex- 
specific cis-eQTL effects was performed with 
linear regression. Mediation of GxSex by 
GxEpithelial interactions was tested with 
the mediation R package. Colocalization of 
GWAS and eQTLs was performed with coloc 
(48). Further details for each analysis are 
provided in (9). 
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INTRODUCTION: Efforts to map quantitative 
trait loci (QTLs) across human tissues by the 
GTEx Consortium and others have identified 
expression and splicing QTLs (eQTLs and 
sQTLs, respectively) for a majority of genes. 
However, these studies were largely performed 
with gene expression measurements from bulk 
tissue samples, thus obscuring the cellular spe- 
cificity of genetic regulatory effects and in turn 
limiting their functional interpretation. Identi- 
fying the cell type (or types) in which a QTL is 
active will be key to uncovering the molecular 
mechanisms that underlie complex trait varia- 
tion. Recent studies demonstrated the feasib- 
ility of identifying cell type-specific QTLs from 
bulk tissue RNA-sequencing data by using com- 
putational estimates of cell type proportions. To 
date, such approaches have only been applied to 
a limited number of cell types and tissues. By 
applying this methodology to GTEx tissues for 
a diverse set of cell types, we aim to charac- 
terize the cellular specificity of genetic effects 
across human tissues and to describe the con- 
tribution of these effects to complex traits. 


Deconvolution of 
7 cell types 


GTEx tissues 


RATIONALE: A growing number of in silico cell 
type deconvolution methods and associated 
reference panels with cell type-specific marker 
genes enable the robust estimation of the 
enrichment of specific cell types from bulk 
tissue gene expression data. We benchmarked 
and used enrichment estimates for seven 
cell types (adipocytes, epithelial cells, hep- 
atocytes, keratinocytes, myocytes, neurons, 
and neutrophils) across 35 tissues from the 
GTEx project to map QTLs that are specific 
to at least one cell type. We mapped such 
cell type-interaction QTLs for expression and 
splicing (ieQTLs and isQTLs, respectively) by 
testing for interactions between genotype and 
cell type enrichment. 


RESULTS: Using 43 pairs of tissues and cell 
types, we found 3347 protein-coding and long 
intergenic noncoding RNA (lincRNA) genes 
with an ieQTL and 987 genes with an isQTL 
(at 5% false discovery rate in each pair). To val- 
idate these findings, we tested the QTLs for 
replication in available external datasets and 


applied an independent validation using allele- 
specific expression from eQTL heterozygotes. 
We analyzed the cell type-interaction QTLs 
for patterns of tissue sharing and found that 
ieQTLs are enriched for genes with tissue- 
specific eQTLs and are generally not shared 
across unrelated tissues, suggesting that tissue- 
specific eQTLs originate in tissue-specific cell 
types. Last, we tested the ieQTLs and isQTLs for 
colocalization with genetic associations for 
87 complex traits. We show that cell type- 
interaction QTLs are enriched for complex 
trait associations and identify colocalizations 
for hundreds of loci that were undetected in 
bulk tissue, corresponding to an increase of 
>50% over colocalizations with standard QTLs. 
Our results also reveal the cellular specificity 
and potential origin for a similar number of 
colocalized standard QTLs. 


CONCLUSION: The ieQTLs and isQTLs identi- 
fied for seven cell types across GTEx tissues 
suggest that the large majority of cell type- 
specific QTLs remains to be discovered. Our 
colocalization results indicate that compre- 
hensive mapping of cell type-specific QTLs 
will be highly valuable for gaining a mech- 
anistic understanding of complex trait asso- 
ciations. We anticipate that the approaches 
presented here will complement studies map- 
ping QTLs in single cells. 
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Detection of cell type-specific effects on gene expression. The enrichment of seven cell types is calculated across GTEx tissues, enabling mapping of cell type- 
interaction QTLs for expression and splicing by testing for significant interactions between genotypes and cell type enrichments. Linking these QTLs to complex trait 
associations enables discovery of >50% more colocalizations compared with standard QTLs and reveals the cellular specificity of traits. 
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The Genotype-Tissue Expression (GTEx) project has 


® Barbara E. Stranger>’, 


identified expression and splicing quantitative 


trait loci in cis (QTLs) for the majority of genes across a wide range of human tissues. However, the 
functional characterization of these QTLs has been limited by the heterogeneous cellular composition 
of GTEx tissue samples. We mapped interactions between computational estimates of cell type 
abundance and genotype to identify cell type—interaction QTLs for seven cell types and show that 
cell type-interaction expression QTLs (eQTLs) provide finer resolution to tissue specificity than bulk 
tissue cis-eQTLs. Analyses of genetic associations with 87 complex traits show a contribution from 
cell type-interaction QTLs and enables the discovery of hundreds of previously unidentified 


colocalized loci that are masked in bulk tissue. 


he Genotype-Tissue Expression (GTEx) 

project (7) and other studies (2-5) have 

shown that genetic regulation of the 

transcriptome is widespread. The GTEx 

Consortium in particular has built an 
extensive catalog of expression and splicing 
quantitative trait loci in cis (cis-eQTLs and cis- 
sQTLs, respectively) across a large range of 
tissues, showing that these cis-eQTLs and cis- 
sQTLs (collectively referred to here as QTLs) 
are generally either highly tissue specific or 
widely shared, even across dissimilar tissues 
and organs (J, 6). However, the majority of these 
studies have been performed by using hetero- 
geneous bulk tissue samples comprising diverse 
cell types. This limits the power, interpretation, 
and downstream applications of QTL studies. 
Genetic effects that are active only in rare cell 
types within a sampled tissue may be unde- 
tected, a mechanistic interpretation of QTL 
sharing across tissues and other contexts is 
complicated without understanding differ- 


ences in cell type composition, and inference 
of downstream molecular effects of regulatory 
variants without the specific cell type context 
is challenging. Efforts to map eQTLs in indi- 
vidual cell types have been largely restricted to 
blood, using purified cell types (7-1) or single- 
cell sequencing (72). 

Although there are many ongoing efforts to 
optimize single-cell and single-nucleus sequenc- 
ing of human tissues (73, 14), including as part 
of the Human Cell Atlas (15), these methods 
are not yet scalable to sample sizes and cov- 
erage sufficient to achieve power comparable 
with that of bulk eQTL studies (16-18). How- 
ever, cell type-specific eQTLs can be computa- 
tionally inferred from bulk tissue measurements 
by using estimated proportions or enrichments 
of relevant cell types to test for interactions 
with genotype. To date, such approaches have 
only been applied to a limited range of cell types, 
such as blood cells (19, 20) and adipocytes (27). 
These studies identified thousands of cell type 


interactions in eQTLs discovered in whole- 
blood samples from large cohorts [5683 sam- 
ples, (19); 2116 samples, (20)], indicating that 
large numbers of interactions are likely to be 
identified by expanding this type of analysis to 
other tissues and cell types. 


Identifying cell types in silico in bulk tissue 


We used computational estimates of cell type 
enrichment to characterize the cell type spe- 
cificity of cis-eQTLs and cis-sQTLs for 43 cell 
type-tissue combinations, using seven cell 
types across 35 tissues (Fig. 1A). Estimating 
the cell type composition of a tissue biospeci- 
men from RNA-sequencing (RNA-seq) remains 
achallenging problem (22), and multiple ap- 
proaches for inferring cell type proportions 
have been proposed (23). We performed ex- 
tensive benchmarking for multiple cell types 
across several expression datasets (figs. S1 
and $2). The xCell method (24), which esti- 
mates the enrichment of 64 cell types using 
reference profiles, was most suitable on the 
combined basis of correlation with cell counts 
in blood (fig. S1A), in silico simulations (fig. 
S1B), correlation with expression of marker 
genes for each cell type (fig. S1, C and D), and 
diversity of reference cell types. Concordance 
between methods was generally high (fig. $1, A 
and E). Furthermore, the inferred abundances 
reflected differences in histology (fig. SIC) and 
tissue pathologies (fig. $2). For each cell type, 
we selected tissues where the cell type was 
highly enriched (fig. $3). The xCell scores for 
these tissue-cell type pairs were highly cor- 
related with the probabilistic estimation of 
expression residuals (PEER) factors used to 
correct for unobserved confounders in the 
expression data for QTL mapping (fig. S4A) 
(1) but were generally weakly correlated with 
known technical confounders (fig. S4B), sug- 
gesting that cell type composition accounts 
for a large fraction of intersample variation in 
gene expression. 


Mapping cell type-interaction eQTLs 
and sQTLs 


To identify cis-eQTLs and cis-sQTLs whose 
effect varies depending on the enrichment of 
the cell type, we leveraged the variability in 
cell type composition across GTEx samples 
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Fig. 1. Study design of mapping cell type ieQTLs and isQTLs in this study. 
(A) Illustration of 43 cell type-tissue pairs included in the GTEx v8 project. 
The full list of tissues included in the GTEx v8 project is provided in (1); two 
brain regions (frontal cortex and cerebellum) were sampled in replicates. 

Cell types with median xCell enrichment score >0.1 within a tissue were used 
(fig. S2). (B) Schematic representation of a cell type-interaction eQTL 

and sQTL. RNA-seq coverage is depicted in gray, blue, and red, representing 
different genotypes. Differences in coverage between genotypes, corresponding 
to a QTL effect, are only observed with high cell type enrichment. The 

scatter plot illustrates the regression model used to identify iQTLs, where 


to test for an interaction between cell type 
and genotype using a linear regression model 
for either gene expression or splicing (Fig. 1, B 
and C, and fig. S5, A and B) (25). Because QTLs 
identified this way are not necessarily specific 
to the estimated cell type but may reflect an- 
other correlated (or anticorrelated) cell type, 
we refer to these eQTLs and sQTLs as cell type- 
interaction eQTLs (ieQTLs) and cell type- 
interaction sQTLs (isQTLs), respectively (or 
iQTLs in aggregate). 

Across cell types and tissues, we detected 
3347 protein coding and long intergenic non- 
coding RNA (lincRNA) genes with an ieQTL 
[ieGenes (26)] and 987 genes with an isQTL 
(isGenes) at 5% false discovery rate (FDR) per 
cell type-tissue combination (Fig. 2A, figs. S5C 
and S6, and table S1). In the following analy- 
ses, we used ieQTLs and isQTLs identified with 
5% FDR unless indicated otherwise. Whereas 
85% of ieQTLs corresponded to genes with at 
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least one standard cis-eQTL [eGenes; we refer 
to cis-eQTLs mapped in bulk tissue as stan- 
dard eQTLs for simplicity (26)], 21% of these 
ieQTLs were not in linkage disequilibrium 
(LD) [coefficient of determination (R) < 0.2] 
with any of the corresponding eGene’s condi- 
tionally independent eQTLs (fig. $7, A and B) 
(1). For comparison, the proportion of genes 
with at least one standard eQTL varies as a 
function of sample size (J), with a median of 
42% across tissues (48% in transverse colon 
and 63% in whole blood). This indicates that 
ieQTL analysis frequently reveals genetic reg- 
ulatory effects that are not detected with stan- 
dard eQTL analysis of heterogeneous tissue 
samples. Unlike standard cis-QTL discovery, 
iQTL discovery was only modestly correlated 
with sample size (Spearman’s p = 0.53 and 
0.35, for ieQTLs and isQTLs, respectively) (fig. 
87, C and D). The tissues with most iQTLs in- 
cluded blood, as well as transverse colon and 
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the dots indicate individual samples. (C 
(Top) The CNTN1 eQTL effect in skin unexposed to the Sun is associated 
with keratinocyte abundance (P = 4.1 x 
effect in whole blood is associated with neutrophil abundance but is only 
detected in samples with lower neutrophil abundances (P = 6.7 x 10°”). 

Each data point represents an individual and is colored by genotype. Cell 
type enrichment scores and gene expression were inverse normal transformed, 
and intron excision ratios were standardized. The regression lines from 

the interaction model illustrate how the QTL effect is modulated by cell 
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breast, which both stratified into at least two 
distinct groups on the basis of histology (27): 
epithelial versus adipose tissue (breast) and 
mucosal versus muscular tissue (colon) (fig. 
SIC). This suggests that interindividual vari- 
ance (which partially reflects variation in bio- 
specimen collection) in cell type enrichment 
driven by tissue heterogeneity is a major deter- 
minant in discovery power and benefits iQTL 
mapping despite being a potential confounding 
factor for other types of gene expression analy- 
ses. Down-sampling analyses in whole blood 
and transverse colon revealed linear relation- 
ships between sample size and ieQTL discovery 
in these tissues, suggesting that considerably 
larger numbers of ieQTLs may be discovered 
with larger sample sizes (fig. S7E). ieQTL dis- 
covery was largely robust to the choice of de- 
convolution method, with ~77% of neutrophil 
ieQTLs detected with xCell also detected with 
CIBERSORT, and close to complete replication 
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Fig. 2. Cell type ieQTL and isQTL discovery. (A) Number of cell type ieQTLs (left) and isQTLs (right) discovered in each cell type-tissue combination at FDR < 5%. 
Bar labels show the number of ieQTLs and isQTLs, respectively. The color key for the tissues is the same as that of Fig. 1A. (B) Proportion of cell type ieQTLs that validated 
in ASE data. Validation was defined as ieQTLs for which the Pearson correlation between aFC estimates from ASE and cell type estimates was nominally significant 

(P < 0.05). Tissue abbreviations are provided in table S2. Bar labels indicate the number of ieQTLs with validation per number of ieQTLs tested. 


[m, > 0.99, where 1 is the proportion of true 
positives (28)] (fig. S7F). 

The QTL effect of ieQTLs and isQTLs can 
increase or decrease as a function of cell type 
enrichment (Fig. 1C and fig. S8A). This cor- 
relation is usually positive (56%; median 
across cell type-tissue combinations). As an 
example, a keratinocyte ieQTL for contactin 1 
(CNTN1) in skin had a stronger effect in 
samples with high enrichment of keratino- 
cytes. However, for some ieQTLs the effect 
was negatively correlated (19%), suggesting 
that the interaction we identified likely cap- 
tures an eQTL that is only active in at least one 
other cell type (fig. S8B). For 24% of ieQTLs, 
the correlation was ambiguous. At a more 
stringent FDR cutoff (FDR < 0.01), the me- 
dian proportion of ieQTLs with ambiguous 
cell type correlation decreased to 11% (fig. S8B, 
right), whereas the proportion of ieQTLs with 
positive correlation increased to 77%. More- 
over, the ieQTLs with ambiguous direction 
tended to have lower minor allele frequency 
(MAF) (fig. S8C), suggesting that at less strin- 
gent FDR, this category might be enriched for 
false positives. 
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Altogether, we identified numerous cell type 
ieQTLs and isQTLs across 43 cell type-tissue 
combinations, including iQTLs that are not 
detected with standard eQTLs analysis in bulk 
tissue. These cell type iQTLs pinpoint the cel- 
lular specificity of QTLs that might not nec- 
essarily be specific to the tested cell type but 
may also capture eQTL effects of correlated (or 
anticorrelated) cell types. 


Validation and replication of cell type iQTLs 


Because few external replication datasets exist, 
we used allele-specific expression (ASE) data of 
eQTL heterozygotes (29, 30) to correlate individual- 
level quantifications of the eQTL effect size 
[measured as allelic fold-change (aFC)] with 
individual-level cell type enrichments. If the 
eQTL is active in the cell type of interest, we 
expect to see low aFC in individuals with low cell 
type abundance and higher aFC in individuals 
with high cell type abundance (fig. S9). The cor- 
relation between cell type abundance and aFC 
across heterozygous individuals can thus be used 
as a measure of validation for a specific ieQTL. 
Using this approach, the median proportion 
of ieQTLs with a significant (P < 0.05) aFC-cell 
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type Pearson correlation was 0.62 (Fig. 2B). For 
13 cell type-tissue combinations with >20 signif- 
icant ieQTLs, the corresponding ny, statistic 
(28) confirmed the high validation rate (mean 
T™ = 0.75) (fig. S10). Although this approach 
does not constitute formal replication in an 
independent cohort, it is applicable to all tested 
cell type-tissue combinations and corrobo- 
rates that ieQTLs are not statistical artifacts 
of the interaction model. 

Next, we performed replication analyses 
in external cohorts, including whole blood from 
the GAIT2 study (31), purified neutrophils (9), 
adipose and skin tissues from the TwinsUK 
study for ieQTLs (5), and temporal cortex from 
the Mayo RNA sequencing study for both 
ieQTLs and isQTLs (32). Replication rates 
ranged from 1 = 0.32 to 0.67, with the highest 
rate observed in purified neutrophils for whole 
blood (fig. S11). The differences in replica- 
tion rate likely reflect a combination of lower 
power to detect cell type ieQTLs/isQTLs com- 
pared with standard eQTLs/sQTLs, as well 
as differences in tissue heterogeneity across 
studies. Taken together, these results show 
that ieQTLs and isQTLs can be detected with 
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reasonable robustness for diverse cell types 
and tissues. 


Cell type ieQTLs contribute 
to tissue specificity 


Next, we sought to determine to what extent 
cell type ieQTLs contribute to the tissue spe- 
cificity of cis-eQTLs. First, we analyzed ieQTL 
sharing across cell types, observing that ieQTLs 
for one cell type were generally not ieQTLs for 
other cell types (for example, myocyte ieQTLs 
in muscle tissues were not hepatocyte ieQTLs 
in liver) (fig. S12A). To determine whether a 
significant cell type interaction effect is asso- 
ciated with the tissue specificity of an eQTL, 
we tested whether cell type ieQTLs are pre- 
dictors of tissue sharing. We annotated the 
top cis-eQTLs per gene across tissues with their 
cell type ieQTL status for the five cell types with 
at least 20 ieQTLs (adipocytes, epithelial cells, 
keratinocytes, myocytes, and neutrophils). This 
annotation was included as a predictor in a 
logistic regression model of eQTL tissue shar- 
ing on the basis of eQTL properties, including 
effect size, minor allele frequency, eGene ex- 
pression correlation, genomic annotations, and 
chromatin state (J). In all five cell types, ieQTL 
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status was a strong negative predictor of tis- 
sue sharing, with the magnitude of the effect 
similar to that of enhancers, indicating that 
ieQTLs are an important mechanism for tissue- 
specific regulation of gene expression (Fig. 3A 
and fig. S12B). Testing whether cell type isQTLs 
are predictors of tissue sharing for four cell 
types with at least 20 isQTLs (adipocytes, epi- 
thelial cells, myocytes, and neutrophils) revealed 
only neutrophil isQTL status as a significant 
negative predictor (fig. S13). This is likely due to 
a combination of lower power to detect isQTLs 
and higher likelihood of splicing-affecting var- 
iants having shared effects if a gene is expressed 
in a tissue or cell type (J). 

We corroborated the finding for ieQTLs using 
multitissue eQTL mapping with MASH (J), 
testing whether eGenes that are tissue specific 
[eQTLs discovered at local false sign rate (LFSR) < 
0.05 only in the tissue type of interest] have a 
higher proportion of cell type ieQTLs compared 
with eGenes that are shared across tissues 
(LFSR < 0.05 in multiple tissues). The propor- 
tion of cell type ieQTLs across all 43 cell type- 
tissue combinations was significantly higher 
in tissue-specific eGenes as compared with 
tissue-shared eGenes (P = 1.9 x 10 °”, one-sided 
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Wilcoxon rank sum test) (Fig. 3B), further high- 
lighting the contribution of cell type-specific 
genetic gene regulation to tissue specificity of 
eQTLs. For tissues with notably high intersam- 
ple heterogeneity (such as breast, transverse 
colon, and stomach), the above-average enrich- 
ment is likely at least partially driven by higher 
power to detect ieQTLs. 

To examine the sharing patterns of cell type 
ieQTLs across tissues, we used two cell types 
with ieQTLs mapped in >10 tissues (16 tissues 
for epithelial cells and 13 for neurons). We 
observed that although standard eQTLs were 
highly shared across the subsets of 16 and 
13 tissues, cell type ieQTLs tended to be highly 
tissue specific, reflected by an average of four 
and five tissues with shared ieQTL effects com- 
pared with 11 and 12 for eQTLs in epithelial and 
brain tissues, respectively (Fig. 3, C and D, left). 
These findings were robust to power differences 
in detecting eQTLs versus ieQTLs, with eQTLs 
remaining predominantly shared even when 
limited to 20% of samples (fig. $14). Of neuron 
ieQTLs, 25.3% were shared between nine brain 
tissues, highlighting that tissues of the cerebrum 
(such as the cortex, basal ganglia, and limbic 
system) show particularly high levels of sharing 
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Fig. 4. Cell type iQTLs are enriched for GWAS signals. (A) Distribution of adjusted GWAS fold-enrichment of (top) 23x87 and (bottom) 7x87 tissue-trait 
combinations using the most significant iQTL or standard QTL per eGene or sGene. (B) Adjusted GWAS fold-enrichments of 87 GWAS traits among iQTLs on the x axis 
and standard QTLs on the y axis. Solid circles indicate significant GWAS enrichment among iQTLs at P < 0.05 (Bonferroni-corrected). Colors represent GWAS 


categories of the 87 GWAS traits (table S3). 
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compared with that of cerebellar tissues, the 
hypothalamus, and the spinal cord (Fig. 3D, left). 
This pattern was absent when analyzing stan- 
dard eQTLs. Pairwise tissue sharing comparisons 
further confirmed that cell type ieQTLs showed 
greater tissue specificity and more diverse tissue 


sharing patterns than those of standard eQTLs, 
which were broadly shared across all tissues 
(Fig. 3, C and D, middle and right). These re- 
sults show that incorporating cell type compo- 
sition is essential for characterizing the sharing 
of genetic regulatory effects across tissues. 


GWAS and tissue-specific eQTLs and sQTLs 

To study the contribution of cell type-interaction 
QTLs to genome-wide association study (GWAS) 
results for 87 complex traits, we first exam- 
ined the enrichment of iQTLs of each cell type- 
tissue combination for trait associations (GWAS, 
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P< 0.05) using QTLEnrich (33). We used 23 and 
7 cell type-tissue pairs (19 and 7 distinct tissues, 
respectively) with >100 ieQTLs or isQTLs, re- 
spectively, at a relaxed FDR of 40% to generate 
robust enrichment estimates of 87 GWAS traits. 
Across all tested cell type-tissue trait pairs, the 
GWAS signal was clearly enriched among ieQTLs 
and isQTLs (1.3 and 1.4 median fold enrich- 
ments, respectively), similarly to standard eQTLs 
and sQTLs (Fig. 4A and table S4). The GWAS 
enrichments were robust to the iQTL FDR 
cutoffs (fig. S15, A and B). 

We next analyzed the enrichments of the 
individual traits for iQTLs of two cell types that 
we estimated had the largest number of ieQTLs: 
neutrophil iQTLs in blood and epithelial cell 
iQTLs in transverse colon. We compared them 
with the corresponding standard QTLs (Fig. 
4B and fig. S15, C and D), focusing on traits 
that had a significant enrichment for either 
QTL type (Bonferroni-adjusted P < 0.05). In 
blood, we observed a significant shift toward 
higher enrichment for ieQTLs (one-sided, paired 
Wilcoxon rank sum test; P = 0.0026) and es- 
pecially isQTLs (P = 2.8 x 10°), which ap- 
pears to be driven by GWAS for blood cell 
traits, and also immune traits having a higher 
enrichment for iQTLs. The higher iQTL signal 
is absent in colon GieQTL, P = 1; isQTL, P = 
0.13), even though the standard QTL enrich- 
ment for blood cell traits appears to be similar 
for blood and colon. This pattern suggests that 
cell type-interaction QTLs may have better 
resolution for indicating relevant tissues and 
cell types for complex traits as compared with 
tissue QTLs, but further studies are needed to 
fully test this hypothesis. 

Next, we asked whether cell type iQTLs can 
be linked to loci discovered in GWASs and used 
to pinpoint their cellular specificity. To this 
end, we tested 13,702 ieGenes and 2938 isGenes 
(40% FDR) for colocalization with 87 GWAS 
traits (7), using both the cell type ieQTL/isQTL 
and corresponding standard QTL; 1370 (10.3%) 
cell type ieQTLs and 89 (3.7%) isQTLs colocal- 
ized with at least one GWAS trait (Fig. 5, A and 
B, and tables S5 and S6). The larger number of 
colocalizations identified for neutrophil ieQTLs 
and isQTLs in whole blood relative to other cell 
type-tissue pairs likely reflects a combination 
of the larger number of ieQTLs and isQTLs 
and the abundance of significant GWAS loci 
for blood-related traits in our set of 87 GWASs 
(Fig. 5B). 

Our analysis revealed a substantial pro- 
portion of loci for which only the ieQTL/isQTL 
colocalizes with the trait (467 of 1370, 34%) 
(Fig. 5B), or where the joint colocalization of 
the ieQTL/isQTL and corresponding standard 
eQTL indicates the cellular specificity of the 
trait as well as its potential cellular origin (401 
of 1370, 29%) (Fig. 5B). For example, a colocal- 
ization between the DExH-box helicase 58 
(DHX58) gene in the left ventricle of the heart 
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and an asthma GWAS was only identified 
through the corresponding myocyte ieQTL 
[posterior probability of colocalization (PP4) = 
0.64] but not the standard eQTL (PP4 = 0.00) 
(Fig. 5C). Cardiac cells such as cardiomyocytes 
are not primarily viewed to have a causal role in 
asthma, but their presence along pulmonary 
veins and their potential contribution to al- 
lergic airway disease have been described (34). 

An example in which both the standard eQTL 
and the cell type ieQTL colocalize with the trait 
is given in Fig. 5C for KREMEN1 in adipocytes in 
subcutaneous adipose tissue and a birth weight 
GWAS (PP4 ~ 0.8); KREMENT has been linked 
to adipogenesis in mice (35). We highlight two 
analogous examples for isQTLs: The epithelial 
cell isQTL for CDHR5 in small intestine colocal- 
ized with eosinophil counts, whereas the stan- 
dard sQTL did not (Fig. 5D), and conversely, 
both the standard sQTL and myocyte isQTL 
for ATP5SL in the left ventricle of the heart 
colocalized with standing height (Fig. 5D). Ad- 
ditional examples of ieQTLs and isQTLs colocal- 
izing with trait associations are provided in 
figs. S16 and S17. Although the iQTLs do not 
necessarily pinpoint the specific cell type where 
the regulatory effect is active, they indicate that 
cell type specificity plays a role in the GWAS 
locus. Together, our colocalization results indi- 
cate that cell type-interaction QTLs yield new 
potential target genes for GWAS loci that are 
missed by standard QTLs and provide hypothe- 
ses for the cellular specificity of regulatory effects 
underlying complex traits. 


Discussion 


By mapping interaction effects between cell 
type enrichment and genotype on the tran- 
scriptome across GTEx tissues, we provide an 
atlas of thousands of eQTLs and sQTLs that 
are likely to be cell type-specific. The ieQTLs 
and isQTLs we report here include several 
immune and stromal cell types in tissues where 
cell type-specific QTLs have not been charac- 
terized in prior studies. Cell type ieQTLs are 
strongly enriched for tissue and cellular speci- 
ficity and provide a finer resolution to tissue 
specificity than that of bulk cis-QTLs that are 
highly shared between tissues. Given the en- 
richment of GWAS signal in cell type iQTLs for 
cell types potentially relevant to the traits, and 
the large fraction of colocalizations with GWAS 
traits that are only found with cell type iQTLs, 
exhaustive characterization of cell type-specific 
QTLs is a highly promising approach toward 
a mechanistic understanding of these loci, 
complementing experimental assays of variant 
function. However, the substantial allelic het- 
erogeneity observed in standard QTLs (7) and 
limited power to deconvolve QTLs that are spe- 
cific to rare cell types or with weak or opposing 
effects indicate that many more cell type- 
specific QTLs exist beyond those that can be 
currently computationally inferred from bulk 
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tissue data. We therefore anticipate that up- 
coming population-scale single-cell QTL studies 
will be essential to complement the approaches 
presented here. However, because those data 
are still difficult to obtain for many tissues, our 
demonstration of the insights gained from cell 
type iQTLs indicates that improving decon- 
volution approaches and increasing sample 
sizes will be valuable in this effort and enable 
discoveries for cell types and tissues not con- 
sidered in this study. 


Methods summary 


The GTEx version 8 (v8) data (1) was used for all 
analyses. Cell type enrichments were computed 
with xCell (24). Interaction QTL mapping was 
performed with tensorQTL (36). Full methods 
are available in (26). 
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INTRODUCTION: Telomeres are DNA-protein 
complexes located at the end of chromo- 
somes that protect chromosome ends from 
degradation and fusion. The DNA component 
of telomeres shortens with each cell divi- 
sion, eventually triggering cellular senescence. 
Telomere length (TL) in blood cells has been 
studied extensively as a biomarker of human 
aging and risk factor for age-related diseases. 
The extent to which TL in whole blood reflects 
TL in disease-relevant tissue types is unknown, 
and the variability in TL across human tissues 
has not been well characterized. The postmor- 
tem tissue samples collected by the Genotype- 
Tissue Expression (GTEx) project provide an 
opportunity to study TL in many human tissue 
types, and accompanying data on inherited 


952 GTEx donors — 6391 tissue samples 
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genetic variation, gene expression, and donor 
characteristics enable us to examine demo- 
graphic, genetic, and biologic determinants and 
correlates of TL within and across tissue types. 


RATIONALE: To better understand variation in 
and determinants of TL, we measured relative 
TL (RTL, telomere repeat abundance in a DNA 
sample relative to a standard sample) in more 
than 25 tissue types from 952 GTEx donors 
(deceased, aged 20 to 70 years old). RTL was 
measured for 6391 unique tissue samples 
using a Luminex assay, generating the largest 
publicly available multitissue TL dataset. We 
integrated our RTL measurements with data 
on GTEx donor characteristics, inherited ge- 
netic variation, and tissue-specific expression 
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TL in human tissues. Using a Luminex-based assay, TL was measured in DNA samples from >25 different 
human tissue types from 952 deceased donors in the GTEx project. TL within tissue types is determined 
by numerous factors, including zygotic TL, age, and exposures. TL differs across tissues and correlates 
among tissue types. TL in most tissues declines with age. 
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and analyzed relationships between RTL 
and covariates using linear mixed models 
(across all tissues and within tissues). Through 
this analysis, we sought to accomplish four 
goals: (i) characterize sources of variation in 
TL, (ii) evaluate whole-blood TL as a proxy 
for TL in other tissue types, (iii) examine the 
relationship between age and TL across tissue 
types, and (iv) describe biological determinants 
and correlates of TL. 


RESULTS: Variation in RTL was attributable to 
tissue type, donor, and age and, to a lesser 
extent, race or ethnicity, smoking, and inher- 
ited variants known to affect leukocyte TL. 
RTLs were generally positively correlated 
among tissues, and whole-blood RTL was a 
proxy for RTL in most tissues. RTL varied 
across tissue types and was shortest in whole 
blood and longest in testis. RTL was inversely 
associated with age in most tissues, and this 
association was strongest for tissues with 
shorter average RTL. African ancestry was 
associated with longer RTL across all tissues 
and within specific tissue types, suggesting 
that ancestry-based differences in TL exist in 
germ cells and are transmitted to the zygote. 
A polygenic score consisting of inherited var- 
iants known to affect leukocyte TL was asso- 
ciated with RTL across all tissues, and several 
of these TL-associated variants affected ex- 
pression of nearby genes in multiple tissue 
types. Carriers of rare, loss-of-function var- 
iants in TL-maintenance genes had shorter 
RTL (based on analysis of multiple tissue 
types), suggesting that these variants may 
contribute to shorter TL in individuals from 
the general population. Components of telo- 
merase, a TL maintenance enzyme, were more 
highly expressed in testis than in any other 
tissue. We found evidence that RTL may 
mediate the effect of age on gene expression 
in human tissues. 


CONCLUSION: We have characterized the var- 
iability in TL across many human tissue types 
and the contributions of aging, ancestry, ge- 
netic variation, and other biologic processes to 
this variability. The correlation observed among 
TL measures from different tissues highlights 
the existence of host factors with effects on TL 
that are shared across tissue types (e.g., TL 
in the zygote). These results have important 
implications for the interpretation of epidemi- 
ologic studies of leukocyte TL and disease. 


The list of author affiliations and a full list of the GTEx authors 
and their affiliations are available in the full article online. 
*Corresponding author. Email: brandonpierce@uchicago.edu 
Cite this article as K. Demanelis et al., Science 369, eaaz6876 
(2020). DOI: 10.1126/science.aaz6876 
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Telomere shortening is a hallmark of aging. Telomere length (TL) in blood cells has been studied 
extensively as a biomarker of human aging and disease; however, little is known regarding variability 
in TL in nonblood, disease-relevant tissue types. Here, we characterize variability in TLs from 6391 
tissue samples, representing >20 tissue types and 952 individuals from the Genotype-Tissue Expression 
(GTEx) project. We describe differences across tissue types, positive correlation among tissue types, 
and associations with age and ancestry. We show that genetic variation affects TL in multiple tissue 
types and that TL may mediate the effect of age on gene expression. Our results provide the 
foundational knowledge regarding TL in healthy tissues that is needed to interpret epidemiological 


studies of TL and human health. 


elomeres are DNA-protein complexes lo- 

cated at the end of chromosomes that 

protect chromosome ends from degra- 

dation and fusion (7). The length of the 

DNA component of telomeres, a six- 
nucleotide repeat sequence, shortens as cells 
divide (2), with short telomeres eventually 
triggering cellular senescence (3, 4). In most 
human tissues, telomere length (TL) gradually 
shortens over time, and TL shortening is 
considered a hallmark (and a potential under- 
lying cause) of human aging (5). In human 
studies, short TL measured in leukocytes is 
associated with increased risk of aging-related 
diseases, including cardiovascular disease 
(6) and type 2 diabetes (7), as well as overall 
mortality and human life span (8). However, 
long TL may increase the risks for some types 
of cancer (9-11). Leukocyte TL is influenced by 
inherited genetic variation [single-nucleotide 
polymorphisms (SNPs)], some of which reside 
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near genes with known roles in telomere main- 
tenance (12-15). Leukocyte TL is also associated 
with lifestyle factors (e.g., physical activity), 
health factors (e.g., obesity, cholesterol), and 
environmental exposures (e.g., cigarette smok- 
ing) (16, 17). 

Epidemiologic studies of TL predominantly 
use blood (occasionally saliva) as a DNA source. 
Thus, our understanding of variation in TL, 
its determinants (e.g., demographic, lifestyle, 
and genetic factors), and its associations with 
disease phenotypes almost entirely rely on TL 
measured in leukocytes from whole blood (WB). 
Few studies have compared TL in leukocytes 
with TL in other human tissue types; those 
that have are relatively small (<100 participants; 
<5 tissue types) but provide evidence that TL 
differs across tissue types and that TL mea- 
surements from different tissue types are cor- 
related (18, 19). Thus, larger studies of many 
additional tissue types are needed to gain a 
comprehensive understanding of variation in 
TL and its determinants within and across a 
wide range of human tissues and cell types. 

To address these gaps in our understanding 
of TL and its role in disease risk and its relation- 
ship with age, we measured TL in >6000 unique 
tissue samples, representing >20 distinct tis- 
sue types and >950 individual donors from the 
Genotype-Tissue Expression (GTEx) project 
version 8 (v8) (20). In this work, we (i) char- 
acterize sources of variation in TL, (ii) eval- 
uate leukocyte TL as a proxy for TL in other 
tissues, (iii) examine the relationship between 
age and TL across tissue types, and (iv) de- 
scribe biological determinants and correlates 
of TL. This work presents results from tissue- 
specific and pan-tissue TL analyses that are 
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crucial for improving our understanding of 
the etiologic role of TL in aging and chronic 
disease. 

We attempted measurement of relative TL 
(RTL, the telomere repeat abundance relative 
to a standard reference DNA sample) for 
7234 tissue samples from 962 GTEx donors 
using a Luminex-based assay (27). After re- 
moving 836 samples with failed RTL mea- 
surements and seven RTL measurements that 
were within-tissue outliers, our analytic data- 
set included 6391 tissue-specific RTL measure- 
ments from 952 donors, with 24 different 
tissue types having =25 RTL measurements 
(table S1). Each donor provided only one RTL 
measurement per tissue type, and on average, 
each donor had RTL measured in seven dif- 
ferent tissue types (range: 1 to 26 tissue types) 
(fig. S1). The median donor age was 55 (range: 
20 to 70) years. The majority of donors were 
male (67%) and of European descent (85%), 
and there were more postmortem donors (54%) 
than organ donors (table S1). Extensive valid- 
ation and characterization of the Luminex- 
based RTL assay are described in (27). 


TL varies across (and correlates among) 
human tissue types 


We estimated the contribution of tissue type 
to the variation in RTL using linear mixed 
models (LMMs) adjusted for fixed effect co- 
variates [age, sex, body mass index (BMJ), race 
and ethnicity category, donor ischemic time, 
and technical factors, represented by plate (e.g., 
batch effects, DNA quality and concentration)] 
and with random effects representing tissue 
type and donor (table S2) (27). On average, 
RTL was the shortest in WB and longest in 
testis, with testis being an outlier tissue type 
[analysis of variance (ANOVA), p < 2 x 10°" 
compared with all other tissues] (Fig. 1A). Tis- 
sue type explained 24.3% of the variation in 
RTL across all tissues but only 11.5% when testis 
was excluded, indicating that tissue type ac- 
counts for substantial variability in human TL. 

We examined Pearson pairwise correlations 
in RTL among tissue types with tissue pairs 
from same donor, restricting to 20 tissue types 
with TL data for =>75 samples (Fig. 1B). Forty- 
one tissue-pair correlations passed a Bonferroni 
p value threshold (¢ tests, p < 3 x 10), and all 
41 correlations were positive (table S3). Tissue 
pairs from the same organ were among the 
strongest correlations observed: sun-exposed 
and nonexposed skin [Pearson correlation co- 
efficient (r) = 0.24, t test, p = 9 x 10°, n = 112], 
transverse and sigmoid colon (Pearson 7 = 
0.40, ¢ test, p = 8 x 10°”, n = 139), and esoph- 
agus mucosa (EM) and gastric junction (EGJ) 
(Pearson 7 = 0.22, t test, p = 3 x 10°°, n = 188). 
After applying hierarchical clustering to these 
pairwise correlations with average linkage, 
tissue RTLs separated into three clusters (Fig. 
1B and fig. S2). Two clusters were characterized 
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Sources of Correlation in 
Telomere Length Among Tissue Types 


1. Common cellular origin (zygote) 
2. Age-related TL decline 
3. Identical germline polymorphisms 


(i.e. TL-maintenance SNPs) 


Fig. 1. TLs differ across human tissue types but are correlated ai 
tissues types. (A) Distribution of RTL across 24 GTEx tissue types 


mong 
(ordered 


by median RTL) (see table S1). Nine-hundred fifty-two donors contributed one or 
more tissue samples to the analysis, and the sample size for each tissue type 
corresponds to unique donors (i.¢., no donors are represented twice for a 


given tissue type). (B) Pearson (r) correlations between RTL measu 
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res from 


Adult Tissues 
(mostly differentiated cells) 


Sources of Differences in 
Telomere Length Among Tissue Types 


Cell division/turnover rates 
Disease status 


Exposures (DNA damage) 
Inflammation 
TL maintenance 


Cal ge) 


different tissue types. Tissues included have =75 samples and were not sex 
specific. Red, yellow, and blue correspond to r = 1, 0, and -1, respectively. 
Black boxes are results from hierarchical clustering (three clusters). 

(Exact correlations are in table S3.) (C€) Theoretical framework describing 
determinants of TL across human tissue types. (D) Pearson correlations between 
WB RTL and tissue-specific RTL measurements (with 95% confidence intervals). 
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by common developmental origin: (i) meso- 
dermal and ectodermal (e.g., muscle and skin) 
and (ii) endodermal origin tissues (e.g., stom- 
ach and lung). Thyroid and brain cerebellum 
formed the third cluster. Similar clustering 
patterns among tissue types were observed 
for females (fig. S3) and males (fig. S4), where 
testis was also an outlying tissue type and 
clustered with thyroid. The positive correla- 
tions observed among most tissue types are 
likely due to the fact that the initial TL in the 
zygote affects TL in all adult tissues through 
mitotic inheritance. Differences in tissue- 
type TL and the extent of correlation among 
tissue-type TLs are likely attributable to var- 
iability in both intrinsic (e.g., cell division 
rate and history, telomere maintenance) and 
extrinsic (e.g., response to environmental ex- 
posures) factors across tissues (Fig. 1C). To 
assess the possibility that extrinsic factors 
could modify the correlation between TL in 
different tissues, we assessed the overall differ- 
ence in the correlation matrix by smoking his- 
tory and obesity (as an indicator of disease 
status and health). In this exploratory anal- 
ysis, the observed pairwise correlations among 
tissue types did not substantially differ between 
obese and normal or overweight donors. How- 
ever, among individuals with a history of smok- 
ing, the correlation among tissue types was 
somewhat stronger compared with never- 
smokers (Jennrich’s chi-square test, p = 0.003), 
but the underlying reason for this observa- 
tion is unknown. 


WB TL is a proxy for TL in other tissues 


WB RTL was positively correlated (Pearson 
correlation, ¢ test, p < 0.05) with tissue-specific 
RTL measurements from 15 out of 23 tissue 
types (7 = 25 for each test), with Pearson cor- 
relations ranging from 0.15 to 0.37 (Fig. 1D). 
These results demonstrate that WB TL is a 
proxy for TL in many tissue types. WB RTL 
captured between 2% (testis) and 14% (tibial 
nerve) of the variation in RTL measured in 
other tissue types. Adjustment for age, sex, 
BMI, and donor ischemic time did not have a 
major impact on the associations observed 
between WB RTL and tissue-type RTL in the 
23 tissue types (fig. S5). Notably, tibial nerve 
RTL had the strongest correlation with WB 
RTL. The GTEx tibial nerve samples largely 
contain connective tissue, Schwann cells, and 
the axons of neuron cells (which do not con- 
tain the DNA from neuron cells), and the strong 
correlation between tibial nerve RTL and WB 
RTL is likely due to the fact that the tibial 
nerve tissue and WB have connective tissue 
origins. Breast and ovary RTL had negative 
point estimates for their correlations with WB 
RTL, but the 95% confidence intervals over- 
lapped zero. The relationships between the 
RTL from these tissue types and WB RTL 
require further investigation. 
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RTL measurements have inherent measure- 
ment error (22), including our Luminex assay 
(23), and this error can attenuate the strength 
of the correlation observed between RTL mea- 
surements taken from two different tissue 
types. To better understand this error, we con- 
ducted extensive validation and characteriza- 
tion of our Luminex-based assay, including 
comparisons to TL measured by Southern blot of 
terminal restriction fragments (TRFs) reported 
previously by Pierce et al. (23) and conducted 
within GTEx (27). Based on this validation 
work (23), we conclude that that the percent- 
age of variation in our Luminex RTL measures 
that is due to (nondifferential) measurement 
error is <50%. The true percentage cannot be 
estimated because the extent of measurement 
error in our gold standard TL measure, South- 
ern blot analysis of TRFs, is unknown. There- 
fore, we used simulated data to estimate the 
impact of measurement error (ranging from 
0 to 50% of the variation in RTL) on the cor- 
relations between RTL measurements from 
different tissues (27). Our results show that 
the correlations observed in this study will be 
attenuated, and this attenuation will increase 
with increasing error in the RTL measurements 
(fig. S6). 

In addition to validating our Luminex RTL 
measurements against TL measured using 
Southern blot, we have also validated these 
measurements against RTL measured using 
quantitative polymerase chain reaction (qPCR) 
(24), both in previous work (25) and using GTEx 
samples (27). Within GTEx, RTL measurements 
from qPCR (24) and TL measured from South- 
ern blot (26) showed strong correlation with 
our Luminex RTL measurements and similar 
differences among tissue types as observed for 
the Luminex RTL measurements (Fig. 1A) (27). 


TL varies among individuals 
and by participant characteristics 


TL varied across individuals (donors) (Fig. 2A, 
top), with 8.7% of the variation in RTL attrib- 
utable to variability among individuals (esti- 
mates obtained from an adjusted LMM) (table 
$2). This percentage increased to 11.2% when 
testis was excluded. After adjusting for tissue 
type and donor (as random effects), age ex- 
plained 3.3% (among all tissues) and 4.4% 
(excluding testis) of variation in RTL, whereas 
BMI, TL-associated SNPs, smoking status, and 
race and ethnicity category each explained 
<1% of the variation across all tissues [marginal 
coefficient of determination (R?), likelihood- 
ratio test (LRT), p < 0.05] (Fig. 2B, top), de- 
monstrating that these factors contribute to 
pan-tissue TL dynamics. We observed no clear 
association between sex and RTL across all 
tissues (table S2), and sex showed weak evi- 
dence of association with RTL in tissue-specific 
analyses (table S4). Multiple prior studies have 
reported an association between longer leuko- 
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cyte TL and female sex (27). However, we may 
be underpowered to detect this association 
for WB RTL, considering some larger studies 
have failed to detect it (28) and the association 
may be less evident at younger ages (29). The 
lack of association across all tissue types points 
to the possibility that this sex difference for 
leukocyte TL may not be consistent across all 
tissue types. RTL was shorter among (ever) 
smokers compared with never-smokers in lung 
and in WB (LRT, p < 0.05) (fig. S7), consistent 
with prior studies of leukocyte TL (30). 

We conducted a principal component (PC) 
analysis of RTL from 11 nonreproductive tissue 
types (each with m = 200 samples) from 750 
participants (27) and generated a composite 
measure of TL for each donor on the basis of 
the first PC that explains 51% of the variation 
in TLamong these tissue types (Fig. 2A, bottom). 
We observed that age, BMI, and smoking status 
were associated with shorter composite RTL 
and explained 13.7, 1.3, and 0.6%, respectively, 
of the variation in this composite TL measure 
(Fig. 2B, bottom). Race and ethnicity category 
was associated with longer composite TL in 
African Americans compared with European 
Americans and explained 1.6% of the varia- 
tion in composite TL. This composite TL likely 
reflects variation in TL present in the zygote 
(and in tissues during early development) that 
is mitotically inherited by cells in adult tissues. 


TL is longer in genomes of African ancestry 


To further explore differences in TL by race 
and ethnicity category, we first confirmed that 
PCs derived from genome-wide SNP data (n = 
838 donors), representing genetic ancestry, 
showed clear clustering by reported race and 
ethnicity category among donors (Fig. 2C, 
inset). Genetic ancestry (European versus 
African) explained 0.6% of the variation in 
RTL across all tissues (marginal R®, LRT, p = 
1 x 10°°) after adjusting for tissue type and 
donor as random effects and 2.3% of the var- 
iation in composite RTL (F test, p = 7 x 10°). 
After including adjustments for age, sex, 
donor ischemic time, technical factors, and 
random effects of tissue type and donor, RTL 
was longer among individuals of African an- 
cestry compared with individuals of European 
ancestry across all tissue types (LRT, p = 0.007), 
demonstrating that the effect of ancestry on 
TL, reported previously for leukocyte TL (37-34), 
extends to TL in other tissue types. The adjusted 
association between African ancestry and RTL 
was positive for 16 out of 19 tissues tested, with 
LRT p values <0.05 for brain cerebellum (p = 
0.03), thyroid (p = 0.02), prostate (p = 0.03), 
lung (p = 0.02), and WB (p = 0.005) (Fig. 2C 
and table S5). The observation that individuals 
of African ancestry have longer TL in many 
tissue types is consistent with the hypothesis 
that ancestry-based differences in TL are present 
early in development (35) and potentially in 
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Fig. 2. TL varies among individuals and by ancestry. (A) Distribution of RTL 
across GTEx donors ranked by donors’ mean RTL across all measured tissue 
types (top) and distribution of a “composite RTL” measure (bottom), estimated 
as the first PC from a PC analysis (PCA) of 11 tissue types (21). Colors correspond 
to GTEx tissue type. (B) Contribution of selected covariates to variability in 
RTL across all tissues (top) and composite RTL (bottom). For the analysis across 
all tissues, estimates were extracted as marginal R® values from LMMs 


consistent clustering of 


adjusted for tissue type and donor as random effects. (C) Distribution of RTL 
measures for individuals of European ancestry (EA) and African ancestry 


germ cells (preconception). In other words, 
our results suggest that offspring (zygotes) 
inherit telomeres from germ cells that vary 
in TL because of ancestry, and these ancestry- 
based differences in TL are mitotically trans- 
mitted to daughter cells, and eventually to cells 
in many adult tissue types. This “direct trans- 
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(blue) in (C) and (D). 


mission” of TL from parent to offspring (36) 
would result in the observed ancestry-based 
differences across many tissue types (sum- 
marized in Fig. 2D). One likely cause of this 
ancestry-based difference is natural selection 
on SNPs know to affect TL (37), although se- 
lection on TL itself could also contribute. 
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(AA). Tissue types are ranked by the largest difference between median RTL of 
the two ancestry groups. The inset shows genotyping PCs, demonstrating 


individuals by genetically predicted ancestry. Sample- 


size information and associations between African ancestry and RTL are 
reported in table S5. (D) Schematic describing the direct inheritance of TL from 
parental germ cells and expected relationship to TL across adult tissue types 
for individuals of African and European ancestry. Genetic (and reported race 
and ethnicity category) ancestry was color coded for African (red) and European 


TL is correlated with age in most tissues 

Of 24 tissues with =25 samples, RTL was neg- 
atively correlated (Pearson r < 0) with age in 
21 tissue types (p < 0.05 in 14 tissue types 
from ¢ test) (Fig. 3A and fig. S8), providing 
new evidence to support the hypothesis that 
age-related TL shortening occurs in most 
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tissue types. The strongest correlations with 
age were observed for WB (Pearson r = -0.35, 
ttest, p = 2 x 10°, n = 637) and stomach (r = 
0.37, t test, p = 7 x 10°”, n = 420) (table S6). 
Age explained more of the variation in RTL 
for tissues with shorter mean RTL [coeffi- 
cient of determination (77) = 0.23, F test, p = 
0.02] (Fig. 3B). The association between age 
and RTL differed by sex for hippocampus 
(t test, Dinteraction = 0-04), transverse colon 
(t test, Dinteraction = 0.01), and lung (¢ test, 
Pinteraction = 0.04), suggesting that TL 
shortening with age is greater in men com- 
pared with women in some tissues. Among 
tissue types for which RTLs did not have a 
clear correlation with age (¢ test, p > 0.05), we 
examined whether RTL differed among 5-year 
age groups, but we observed no age-related 
differences in RTL for testis, ovary, cerebel- 
lum, vagina, skeletal muscle, thyroid, and 
EGJ (ANOVA, p > 0.05). Although prior 
studies have observed longer TL in sperm 
from older men (38), we did not observe a clear 
increasing (or decreasing) trend for testis RTL 
with increasing age (fig. S9). 

Among tissue types for which RTL was cor- 
related with age (¢ test, p < 0.05), the strength 
of association varied across tissue types (Fig. 
3C and table S6). To further explore the hy- 
pothesis that TL shortens at different rates 
in different tissue types, we calculated the 
difference in RTL (ARTL) between all pairs 
of tissue types available for each donor. We 
constructed 155 ARTL variables, restricting to 
tissue pairs with complete data for =50 do- 
nors. The Pearson correlation between ARTL 
and age was estimated for each tissue-type pair 


A 


to determine if the ARTL varies with age (fig. 
S10). Forty-two of the 155 ARTL variables were 
correlated with age (Pearson correlation, ¢ test, 
p < 0.05), and the absolute values of these cor- 
relations ranged from 0.12 to 0.38 (table S7). 
Four of the ARTLs surpassed a Bonferroni 
p value of 3 x 10°*: EGJ and stomach (r = 
0.32, t test, p = 1x 10°, n = 176), WB and thyroid 
(r = 0.30, t test, p = 3 x 10°°, n = 182), EM and 
stomach (r = 0.25, t test, p = 3 x 10°, and n = 
276), and WB and ovary (r = 0.33, t test, p = 2 x 
10~*, n = 120). Our results indicate that age 
explains up to 14% of the variation in the 
difference in RTL between pairs of tissue types. 
A prior study of 87 adults reported that the 
rate of age-related TL shortening was similar 
for muscle, leukocytes, fat, and skin (i.e., no 
association between age and ARTLs), con- 
cluding that age-related TL loss within stem 
cells is consistent across adult tissue types 
(18). When we examined these tissue types 
among our ARTL pairs (n = 50), age was cor- 
related with ARTL for skeletal muscle and 
blood (r = 0.36, ¢ test, p = 2 x 10-7, n = 68) but 
less for skin (unexposed) and blood (7 = 0.09, 
t test, p = 0.20, n = 197) and skin (exposed) 
and blood (7 = 0.08, ¢ test, p = 0.24, n = 200). 


Leukocyte TL—associated genetic variants 
and TL in other tissues 


Prior genome-wide association studies (GWASs) 
have identified SNPs associated with leuko- 
cyte TL (12-15). We constructed a weighted 
polygenic SNP score for each donor using nine 
leukocyte TL-associated SNPs (27), with higher 
score reflecting longer TL (table S8) (39).We 
examined the association between this poly- 


genic SNP score and RTL for tissue types with 
2100 samples. After adjustment for age, sex, 
genotyping PCs, donor ischemic time, and tech- 
nical factors as a random effect, an associa- 
tion with the SNP score (LRT, p < 0.05) was 
observed for WB RTL (p = 0.007) (fig. S11), 
cerebellum RTL (p = 0.03), pancreas RTL (p = 
0.04), and transverse colon RTL (p = 0.02) 
(Fig. 4A, fig. S12, and table S9). Among these 
18 tissue types, 16 had positive association 
estimates [binomial test (Mp = 0.5), p = 0.001]. 
In analyses of all tissue types, RTL was posi- 
tively associated with the SNP score (LRT, p = 
0.01) after adjustments. These results indicate 
that at least some of the genetic variants (or 
regions) that affect leukocyte TL also affect TL 
in other tissue types. 


TL-associated variants influence 
local gene expression 


Among the nine regions known to harbor SNPs 
associated with leukocyte TL, we examined 
whether these SNPs also affect local gene ex- 
pression in GTEx tissue types and cell lines 
(21). Colocalization analysis can be used to de- 
termine if a common causal variant affects a 
trait (e.g., TL) and expression of a nearby gene 
(40). If there is a common causal variant under- 
lying both association signals, then we may 
infer that SNPs may influence TL via effects on 
gene expression. We used colocalization anal- 
ysis to estimate the probability that a common 
causal variant underlies association signals 
for leukocyte TL (from GWASs) (12-15) and cis- 
eQTL (expression quantitative trait loci) as- 
sociation signals from GTEx (v8) analyses (20). 
Colocalization results indicated that at least 
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Fig. 3. Age is negatively correlated with TL in most tissues, and correlation 
is strongest in tissues with shorter telomeres. (A) Pearson correlations 
between age and tissue-specific RTL measures. (B) Scatterplot of mean RTL 
for each tissue versus the percent variation explained by age (r°) for each 
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tissue. The size of each point is proportional to sample size for that tissue 
type. (C) Relationship between RTL and age for five selected tissue types 
(WB, lung, stomach, transverse colon, and skin (exposed)]. For all plots, 
colors correspond to tissue type. 
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six of the nine TL-associated regions shared a 
common causal variant with a cis-eQTL in at 
least one tissue type, on the basis of a posterior 
probability of colocalization of >80% across all 
three sets of priors tested (Fig. 4, B and C; fig. 
$13; and table S10). 

The association signal for TL on chromo- 
some 19 (represented by rs8105767) showed 
strong evidence of colocalization with an eQTL 
affecting expression of gene ZNF257 in eight 
tissue types, including skin (sun exposed), trans- 
verse colon, and stomach (Fig. 4B). ZNF257 en- 
codes a zinc-finger protein that may be involved 
in transcriptional regulation. The association 
signal for TL on chromosome 10 (represented 
by rs9420907) colocalized with an eQTL affect- 
ing expression of STII in seven tissue types, 


including skin (sun exposed), transverse colon, 
and EM (Fig. 4C). Additional TL-associated 
loci showed colocalization with GTEx eQTLs 
for NAFI, MYNN, RPII-109N23.6, and TSPYL6 
(fig. $13 and table S10). Although these colo- 
calizations were observed for eQTLs in tissue 
types with largely differentiated cells, eQTLs 
observed in induced pluripotent stem cells 
have been shown to be largely shared with 
eQTLs in GTEx tissue types (41). This finding 
suggests that the observed evidence of co- 
localization may be pertinent to TL mainte- 
nance within stem and progenitor cells, which 
have active telomerase activity. Notably, NAF7 
encodes a protein involved in telomere assembly, 
and loss-of-function (LOF) mutations in this gene 
are associated with shorter telomere length in 


pulmonary fibrosis (PF) patients (42). These re- 
sults suggest that TL-associated loci influence 
TL within human tissues through regulation 
of the expression of genes known to be involved 
in telomere maintenance (e.g., STII, NAF) (12), 
as well as genes whose role in telomere main- 
tenance is unclear (e.g., ZNF257). 

Notably, we observed little evidence of co- 
localization of the TERT or TERC TL-associated 
regions with any cis-eQTLs. TERT and TERC 
are important components of telomerase. The 
telomerase enzyme can extend the telomere 
repeat sequence, typically in stem and/or pro- 
genitor cells, to compensate for TL shortening; 
however, TERT and TERC have low or unde- 
tectable expression in a majority of adult GTEx 
tissue samples. This suggests that eQTL studies 
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Fig. 4. Inherited genetic variation affects telomere length in multiple tissue 
types and expression of nearby genes. (A) Associations between a polygenic SNP 
score for leukocyte TL and tissue-specific RTL measures. Colors correspond to 
tissue type. (B) Leukocyte TL association signal from GWASs colocalizes with a cis- 
eQTL for ZNF257 (~40 kb upstream of ZNF208). The top plot shows results from 
the ENGAGE Consortium GWAS of leukocyte TL, and the bottom three plots correspond 
to cis-eQTL results from GTEx tissues: skin—sun exposed, colon—transverse, 
and stomach. chr19, chromosome 19. (©) Leukocyte TL association signal 
colocalizes with a cis-eQTL for STNI (also known as OBFCI in human genome 
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reference hgl9). The top plot corresponds to results from the ENGAGE Consortium 
GWAS of leukocyte TL, and the bottom three plots correspond to cis-eQTL results 
from GTEx tissues: skin—sun exposed, EM, and colon-transverse. (D) Distribution 
of composite RTL (based on PCl from PCA of 11 tissue types) (left) and tissue 
type RTL (right), with highlighted dots representing GT 
rare LOF variant in a telomere maintenance gene previously implicated in TBDs. 
LOF variants are noted in the legend. The black horizontal line corresponds 

to median composite RTL and tissue type RTL. The tissue types presented contain 
one or more LOF carriers, and colors correspond to tissue type. 
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Fig. 5. TL is associated with telomerase subunit gene expression and may mediate the effect of age 
on gene expression. (A) RTL plotted against TERC, TERT, or DKC1 expression across tissue types. Colors 
correspond to GTEx tissue types. (B) Analyses addressing the hypothesis that TL mediates the effect of 
age on expression of specific genes. Scatterplots show estimates of the proportion of the effect of age on 
gene expression mediated by RTL (for each gene) and the —logi9(p value) corresponding to the average 
causal mediation effect of RTL (for each gene). Results are presented for all age-associated genes in each of 
the three selected tissue types (WB, lung, and EM). The mediation p value was obtained using a 
nonparametric bootstrapping approach (n = 10,000 bootstraps). 


of cells from stem and/or developmental tis- 
sues may be needed to understand the mech- 
anisms underlying genetic regulation of TERT 
and TERC expression. 


Carriers of rare LOF variants 
may have shorter TL 


Telomere biology disorders (TBDs, e.g., PF, 
dyskeratosis congenita, aplastic anemia) are 
characterized by short TL in affected individ- 
uals owing to inherited LOF mutations in telo- 
mere maintenance genes (J, 43-45). Individuals 
with TBDs often present with early-onset aging- 
related phenotypes—such as immune dys- 
function, bone failure, liver disease, and lung 
function decline—and these effects can inform 
our understanding of how TL contributes to 
aging in the general population. Using whole- 
genome sequencing data from GTEx donors, 
we searched for LOF rare variants in seven 
genes that have evidence of autosomal dom- 
inant (or partial dominant) inheritance in 
relation to TBDs (e.g., TERC, TERT, TINF2, 
RTELI, PARN, ACD, and NAF1). We identified 
four donors carrying a rare exonic variant 
(minor allele frequency <1%) resulting in a 
predicted LOF frameshift insertion or deletion 
or a stop-gain mutation (Fig. 4D). These LOF 
carriers had shorter TL across all tissues (LRT, 
p = 0.04) and shorter composite TL (¢ test, p = 
0.03). One donor carried a stop-gain variant in 
TERT, and their composite TL was among the 
lowest observed (~first percentile), consistent 
with prior studies of TERT mutations among 
individuals with PF (46, 47). 
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Our results suggest that rare variants in TL- 
maintenance genes may contribute to shorter 
TL in multiple tissues in the general popula- 
tion (i.e., primarily individuals without TBDs). 
However, the PARN and RTELI mutation car- 
riers among the GTEx donors did not have 
RTL values in the (lower) extreme of the com- 
posite TL and tissue-specific RTL distribution(s). 
Although mutations in TL maintenance genes 
and very short TL are often found in individuals 
with TBDs (43-45), prior studies of individuals 
with TBDs have shown that TL can vary sub- 
stantially among carriers (of mutations in PARN, 
RTELI, and TERT), and some carriers have TL 
values similar to noncarriers (46, 48, 49). Prior 
studies of PF patients suggest that LOF TERT 
mutations may have a larger impact on TL than 
LOF mutations in PARN or RTEL] (46, 47, 49). 


TL is associated with telomerase subunit 
expression across tissues 


The protein products of TERT, TERC, and DKCI 
comprise the telomerase catalytic subunit. We 
examined the association between RTL and 
expression of these genes using 3885 GTEx 
tissue samples with both RTL and RNA se- 
quencing (RNA-seq) gene expression data (v8). 
TERT and TERC expression was detectable 
[i.e., transcripts per million (TPM) >0.1] in 28% 
(n = 1089) and 20% (n = '783) of these samples, 
respectively, but DKCI was ubiquitously ex- 
pressed (7 = 3885) in all samples (table S11). 
Whereas DKCI showed correlation with both 
TERT (Pearson r = 0.30, t test, p < 2 x 107", n = 
1089) and TERC (r = 0.23, ttest, p = 3 x 10", 
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n = 783) across all samples, the correlation 
between TERT and TERC expression across 
samples was stronger (r = 0.49, t test, p < 2 x 
107", n = 364) (fig. S14). Testis had substan- 
tially higher mean expression of TERT and 
TERC compared with all other tissues (ANOVA, 
p <2 10) (table S11), but there was no 
association between testis RTL and TERT 
or TERC expression. Across all tissues, RTL 
was positively correlated with TERT (r = 0.58, 
ttest, p <2 x10", n = 1089), TERC (r = 0.33, 
ttest, p < 2 x 10°'°, n = 783), and DKCI (r = 
0.29, ¢ test, p <2 x 10°19, n = 3885) (Fig. 5A). 
When testis was removed, the correlation de- 
creased substantially for both TERT (r = 0.14, 
p=4x 10°, n = 890) and DKCI (r = 0.23, p < 
2x 1077°, n = 3686) and disappeared for TERC 
(r = 0.02, p = 0.63, n = 617). After adjustment 
for covariates and random effect of tissue 
type, RTL showed a positive association with 
increasing quartiles of TERT expression (LRT, 
p = 0.005 including testis and p = 0.002 
excluding testis) and of DKCI expression (LRT, 
p = 0.001 including testis and p = 3 x 10* 
excluding testis) across all tissues. Overall these 
results support the following: (i) high telomerase 
activity in testis (.e., spermatocytes) likely con- 
tributes to longer TL observed in that tissue, and 
Gi) GTEx tissue samples consist primarily of 
differentiated cells, which typically have little to 
no telomerase activity, resulting in minimal 
detectable association between telomerase activ- 
ity in those cells and the observed TL (50, 57). 


TL may mediate the effect of age 
on gene expression 


Aging affects gene expression, so we examined 
whether TL mediates the association between 
age and expression of age-associated genes. 
We analyzed the association between age and 
RNA-seq-based gene expression levels among 
tissues with =150 samples and selected three 
tissue types with >1000 age-associated genes 
[false discovery rate (FDR) of 0.05] (27): WB 
(n = 5239), lung (n = 1366), and EM (7 = 6024) 
(Fig. 5B). Using mediation analysis (52), we 
estimated the proportion of the effect of age 
on expression that was mediated by TL for 
each age-associated gene. For each tissue type, 
we observed substantially more positive than 
negative estimates of the “proportion medi- 
ated” (Fig. 5B), as expected under the hypoth- 
esis that TL is a mediator. (An equal number of 
positive and negative estimates are expected 
under the hypothesis of no mediation.) If TL 
is a mediator for a specific gene, then adjust- 
ment for TL will attenuate the association be- 
tween age and gene expression. We observed 
evidence that RTL mediated the effect of age 
on expression for 607 genes (12%) in WB, 224 
genes (16%) in lung, and 1177 genes (20%) in 
EM ( Dyediation < 0.05, and proportion mediated 
> 0) (tables S12 to S14). In these tissue types, 
RTL mediated between 4 and 34% of the effect 
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Fig. 6. TL and TERT expression are associated with estimated stem cell features. (A) Estimated 
proportion of stem cells within tissues and its relationship between mean RTL (left) and mean 
TERT expression (right). (B) Estimated number of divisions per stem cell (per year) within 

tissues and its relationship between mean RTL (left) and mean TERT expression (right). Colors 
correspond to GTEx tissue types, and the size of each point reflects the sample size of the 

tissue type. Pearson correlations and corresponding p values are reported. Analysis included 


nonreproductive tissues only. 


of age on expression of individual genes; how- 
ever, full mediation will be detected as partial 
mediation in the presence of measurement 
error (for either the mediator or the outcome) 
(3). We evaluated the enrichment of these RTL- 
mediated genes in gene ontology (GO) terms 
among the age-associated genes (Fisher’s exact 
test, FDR < 0.1). Enriched GO terms were iden- 
tified for lung (5 terms), EM (30 terms), and WB 
(108 terms) (tables S15 to S17). No GO terms 
(FDR < 0.1) were common to WB, lung, and EM 
for any ontology. Among 108 enriched GO terms 
in WB, several terms related to apoptosis, cell 
death, and telomere DNA binding were iden- 
tified. The results from this analysis provide 
evidence that TL is a potentially relevant bio- 
logic factor in the mediation of age on gene 
expression and may contribute to processes 
related to biologic aging. 
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Tissue-level stem cell features are associated 
with TL and TERT expression 

After extracting tissue-specific estimates of 
the number of divisions per stem cell (per year) 
and the proportion of stem cells (among all 
cells) for specific tissue types from Tomasetti 
and Vogelstein et al. (54, 55), we examined their 
relationship with mean RTL and mean TERT 
expression among nonreproductive GTEx tis- 
sue types (n = 12; table S18). No associations 
were identified between mean TERC and DKCI 
expression and these stem cell features. Mean 
RTL was positively correlated with estimated 
proportion of stem cells within a tissue type 
(r = 0.71, t test, p = 0.01) (Fig. 6A), and this 
association persisted after adjustment for 
number of divisions per stem cell (¢ test, p = 
0.008) and mean TERT expression (¢ test, p = 
0.02). We did not observe a clear association 
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between mean TERT expression and the esti- 
mated proportion of stem cells within a tissue 
type. These results suggest that tissue types 
with a higher proportion of stem cells in their 
cellular composition may have longer TL mea- 
surements in bulk tissues as a consequence. 

We observed a positive correlation between 
mean TERT expression and the number of di- 
visions per stem cell (7 = 0.76, t test, p = 0.004) 
(Fig. 6B). This association persisted after ad- 
justment for the proportion of stem cells within 
a tissue type (¢ test, p = 0.006) and mean RTL 
(t test, p = 0.01). Mean RTL showed suggestive 
evidence of correlation with the number of 
divisions per stem cell (7 = 0.39, t test, p = 
0.21), and when we restricted to nonblood tis- 
sue types, mean RTL was positively correlated 
with number of divisions per stem cell (7 = 
0.65, t test, p = 0.03). This finding suggests that 
tissue types that undergo more cellular turn- 
over and replacement, such as colon, may have 
higher telomerase expression to maintain TL 
in the stem cell compartments. 


Cell-type composition is associated 
with TL within tissues 


To determine whether TL varies among the 
cell types within a given tissue sample, we ex- 
amined the association between RTL and es- 
timated cell-type enrichment scores (CTES) 
[generated using RNA-seq data and the xCell 
software (56)]. Seven CTES (for adipocytes, 
epithelial cells, hepatocytes, keratinocytes, myo- 
cytes, neurons, and neutrophils) were bench- 
marked by the GTEx Consortium (57), and we 
examined the association between these seven 
CTES and RTL in tissue types with =100 sam- 
ples (n = 16 tissue types). After removing cell 
types not detected within a tissue type (7 = 
37 total CTES tested across 16 tissue types) and 
adjusting for age and sex, we identified eight 
associations (¢ test, p < 0.05) between CTES 
and RTL among 37 associations tested (fig. S15). 
In exploratory analyses, we examined all 
64 CTES provided by xCell that had a detection 
p value <0.05 for >90% samples within a 
tissue type. Restricting to tissue types with 
2300 samples that had both CTES and RTL 
data (WB, lung, and EM), there were 27, 24, 
and 17 CTES detected in each tissue, respec- 
tively (fig. S16). EM and lung had 13 and 
14 CTES that were associated with RTL, after 
adjustment for age and sex (¢ test, p < 0.05). 
RTL was positively associated with epithelial 
cell, smooth muscle cell, keratinocyte, and sebo- 
cyte CTES in both lung and EM (p < 0.05). 
Notably, five CTES were inversely associated 
with RTL (p < 0.05) in both lung and EM, 
including fibroblasts and endothelial cells. 
In WB, lymphoid and myeloid cell CTES ac- 
counted for 70% of the CTES detected, and 
eight CTES were associated with RTL (¢ test, 
—p < 0.05). Neutrophil CTES were positively 
associated with RTL. Both CD8* T cell CTES 
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were inversely associated with RTL, consistent 
with prior work examining cell types and TL 
in blood (58). These results provide evidence 
that TL varies across cell types within a given 
tissue, and consequently, cell-type composi- 
tion can affect TL measurement in human 
tissues. 


TL across all tissues is associated with 
age-related chronic disease status 


Using medical history data from GTEx donors, 
we examined the association between com- 
mon age-related chronic diseases and RTL 
within and across tissues. A history of type 2 
diabetes (22% of donors) was associated with 
shorter RTL across all tissues (LRT, p = 0.02) 
as well as shorter pancreas RTL (p = 0.07) and 
coronary artery RTL (p = 0.01) (fig. $17). Among 
all donors, 50% had no history of any chronic 
disease, and 30, 14, and 6% had a history of 
one, two, and three (or more) chronic diseases, 
respectively. Chronic disease burden (sum of 
chronic diseases from 0 to 5) was associated 
with shorter RTL across all tissues (LRT, p = 
0.008) and in testis, coronary artery, kidney 
cortex, and cerebellum (LRT, p < 0.05 for each). 
When we excluded cancer from the chronic 
disease burden, these associations persisted 
across all tissues (LRT, p = 0.02) and in all 
tissues listed above except for kidney cor- 
tex (LRT, p = 0.09). These observations sug- 
gest that TL may capture some aspect of the 
biologic age-related health decline across 
tissues. 

We did not observe any associations be- 
tween RTL and history of cancer; however, to 
test the hypothesis that normal tissues with 
relatively short (or long) TL are also short (or 
long) in tumors occurring in that tissue, we 
compared the mean tissue-to-WB TL ratio for 
each GTEx tissue with the mean tumor-to-WB 
TL ratio in corresponding cancer types from 
The Cancer Genome Atlas (TCGA) (21, 59). The 
mean cancer TL ratio from TCGA and normal 
TL ratio from GTEx were positively correlated 
(r = 0.44, t test, p = 0.04, n = 23) (fig. S18), pro- 
viding support for this hypothesis. 

After reviewing the medical and death re- 
port information for diseases and conditions 
related to TBDs (21), we identified six donors 
with a reported history of PF and/or intersti- 
tial lung disease (ILD). Five of these donors 
had TL measurements (n = 35 tissue-type sam- 
ples). We observed that three of the donors with 
a history of PF or ILD had composite RTL 
below the fifth percentile (fig. S19). A history 
of PF or ILD was associated with shorter TL 
across all tissues (LRT, p = 0.02) and shorter 
composite RTL (¢ test, p = 0.01). Notably, we 
observed that within tissues, the median RTL 
was substantially shorter for WB (Mann- 
Whitney U test, p = 0.02), pancreas (p = 0.01), 
and EM (p = 0.05) among donors with a his- 
tory of PF or ILD. 
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Discussion 

This study provides a view of the substantial 
variation in human TL that exists across hu- 
man tissue types and among individuals. We 
show that TL is generally positively correlated 
across human tissue types, and that WB TLisa 
proxy for tissue-specific TL for many tissues, a 
finding that may support the use of blood TL 
as a proxy for TL in some tissues in large epi- 
demiological studies. TL was negatively as- 
sociated with age in the majority of tissues 
studied, confirming the hypothesis of perva- 
sive age-related telomere shortening in most 
human tissues. However, our results suggest 
that the rate of shortening can vary across tis- 
sues, and age explained more variation in TL 
in tissues with shorter mean TL. TERT and 
TERC expression were low or undetectable in 
most tissues and not associated with TL within 
any tissue, likely because progenitor cells, which 
express telomerase, are not present in large 
numbers in adult tissue samples, which con- 
sist primarily of differentiated cells. Notably, 
testicular TL was ~1.5- to 2.5-fold longer than 
TL in any other tissue type, and TERT was ex- 
pressed in 100% of these samples and at higher 
levels than in any other tissue, consistent with 
the predominance of spermatogenic cells in 
testis (i.e., cells developing from germ cells 
into spermatozoa), which have high telomer- 
ase activity (57). 

RTL measured in a tissue sample is an av- 
erage of the TLs among all chromosomes within 
a heterogeneous population of cell types with 
different cell division rates and history, stem 
cell composition, and oxidative and inflamma- 
tory environments. To characterize variation 
in TL within specific cell types, cell type-specific 
and single-cell TL studies are needed, poten- 
tially using interphase quantitative fluores- 
cence in situ hybridization approaches (60) 
and flow cell cytometry to isolate specific cell 
types, including stem cells. 

A large proportion of the variation in RTL 
was unexplained across all tissue types, poten- 
tially attributed to sources such as cell-type 
composition (e.g., stem and progenitor cells), 
measurement error, and lifestyle and envi- 
ronmental factors with variable effects across 
tissues. From our simulation-based analysis of 
the impact of TL measurement error on our 
results, we show that random measurement 
error biases our estimate of the true corre- 
lation in TL between two tissues toward zero, 
suggesting that the correlations presented in 
this study are attenuated compared with their 
true associations. 

We lack detailed exposure data (e.g., smok- 
ing and alcohol use) for GTEx donors; studies 
that can link human tissue samples to environ- 
mental and lifestyle histories are needed to 
better understand environmental determi- 
nants of TL across different tissues and cell 
types. As of now, all TL-associated SNPs have 
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been identified in GWASs of leukocyte TL 
(12-15); our study suggests that some of 
these effects are also present in other tissue 
types, but larger studies of tissue-specific TL 
measurements are needed to characterize how 
these effects vary across tissues and cell types. 
Identifying variants that affect TL in all or 
most cell types (e.g., variants with effects on 
TL that may be present during development 
or in stem cells in multiple tissue types) may 
be ideal for evaluating the causal impact of 
TL on risk for a wide array of diseases (oc- 
curring in diverse tissues or cell types) using 
Mendelian randomization. TL shortening is 
an important hallmark of aging in human 
tissues, but TL should also be studied in con- 
junction with other hallmarks of aging. Char- 
acterizing the relationships among TL and 
other aging-related processes and biomarkers 
within and across tissues will improve our 
understanding of cellular aging and its impact 
on human health. 


Methods summary 


We measured RTL in 6391 samples from 952 
GTEx donors using a Luminex-based method. 
These measurements were validated against 
other TL measurement methods, including TL 
measured using Southern blot of TRFs (fig. 
$20) (26), relative TL measured using gPCR 
(fig. S21) (24), and TL estimated from whole- 
genome sequencing data (fig. S22) (67). Pub- 
licly available GTEx donor covariate, genotyping, 
and RNA-seq gene expression data (all v8) were 
integrated into our analyses. We applied LMMs 
to examine the relationships of RTL with age, 
genetic ancestry, gene expression of telomer- 
ase components, estimates of cell types, and 
other covariates across and within tissue types. 
Using GTEx genotyping data, we constructed 
a weighted polygenic SNP score for each do- 
nor using nine leukocyte TL-associated SNPs 
identified from the ENGAGE GWAS of leuko- 
cyte TL (72) and examined colocalization of 
these GWAS association signals with local gene 
expression using summary statistics from the 
ENGAGE study and eQTL results from the 
GTEx Consortium. Mediation analyses were 
applied to examine the extent to which TL 
mediates the effect of age on gene expression. 
Estimates of stem cell division and proportion 
of stem cells were extracted from prior studies 
(54, 55) for corresponding GTEx tissues, and 
their relationship with average RTL and TERT 
expression was examined. 
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Transcriptomic signatures across human tissues 
identify functional rare genetic variation 


Nicole M. Ferraro*, Benjamin J. Strober*, Jonah Einson, Nathan S. Abell, Francois Aguet, 

Alvaro N. Barbeira, Margot Brandt, Maja Bucan, Stephane E. Castel, Joe R. Davis, Emily Greenwald, 
Gaelen T. Hess, Austin T. Hilliard, Rachel L. Kember, Bence Kotis, YoSon Park, Gina Peloso, 

Shweta Ramdas, Alexandra J. Scott, Craig Smail, Emily K. Tsang, Seyedeh M. Zekavat, Marcello Ziosi, 
Aradhana, TOPMed Lipids Working Group, Kristin G. Ardlie, Themistocles L. Assimes, Michael C. Bassik, 
Christopher D. Brown, Adolfo Correa, Ira Hall, Hae Kyung Im, Xin Li, Pradeep Natarajan, GTEx Consortium, 
Tuuli Lappalainen, Pejman Mohammadi;+{, Stephen B. Montgomery{+, Alexis Battle,+ 


INTRODUCTION: The human genome contains 
tens of thousands of rare (minor allele fre- 
quency <1%) variants, some of which contrib- 
ute to disease risk. Using 838 samples with 
whole-genome and multitissue transcriptome 
sequencing data in the Genotype-Tissue Ex- 


pression (GTEx) project version 8, we assessed 
how rare genetic variants contribute to ex- 
treme patterns in gene expression (eOutliers), 
allelic expression (aseOutliers), and alternative 
splicing (sOutliers). We integrated these three 
signals across 49 tissues with genomic anno- 
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Transcriptomic signatures identify functional rare genetic variation. We identified genes in individuals 
that show outlier expression, allele-specific expression, or alternative splicing and assessed enrichment of 

nearby rare variation. We integrated these three outlier signals with genomic annotation data to prioritize 
functional RVs and to intersect those variants with disease loci to identify potential RV trait associations. 
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tations to prioritize high-impact rare variants 
(RVs) that associate with human traits. 


RATIONALE: Outlier gene expression aids in iden- 
tifying functional RVs. Transcriptome sequenc- 
ing provides diverse measurements beyond 
gene expression, including allele-specific ex- 
pression and alternative splicing, which can 
provide additional insight into RV functional 
effects. 


RESULTS: After identifying multitissue eOutliers, 
aseOutliers, and sOutliers, we found that outlier 
individuals of each type were significantly more 
likely to carry an RV near the corresponding 
gene. Among eOutliers, we observed strong en- 
richment of rare structural variants. sOutliers 
were particularly enriched for RVs that dis- 
rupted or created a splicing consensus sequence. 
aseOutliers provided the strongest enrichment 
signal when evaluated from just a single tissue. 

We developed Watershed, a probabilistic 
model for personal genome interpretation that 
improves over standard genomic annotation- 
based methods for scoring RVs by integrating 
these three transcriptomic signals from the 
same individual and replicates in an indepen- 
dent cohort. 

To assess whether outlier RVs identified in 
GTEx associate with traits, we evaluated these 
variants for association with diverse traits in 
the UK Biobank, the Million Veterans Program, 
and the Jackson Heart Study. We found that 
transcriptome-assisted prioritization identi- 
fied RVs with larger trait effect sizes and were 
better predictors of effect size than genomic 
annotation alone. 


CONCLUSION: With >800 genomes matched 
with transcriptomes across 49 tissues, we were 
able to study RVs that underlie extreme changes 
in the transcriptome. To capture the diversity 
of these extreme changes, we developed and 
integrated approaches to identify expression, 
allele-specific expression, and alternative splic- 
ing outliers, and characterized the RV landscape 
underlying each outlier signal. We demonstrate 
that personal genome interpretation and RV 
discovery is enhanced by using these signals. 
This approach provides a new means to in- 
tegrate a richer set of functional RVs into mod- 
els of genetic burden, improve disease gene 
identification, and enable the delivery of pre- 
cision genomics. 


The list of author affiliations and a full list of the GTEx authors 
and their affiliations are available in the full article online. 
*These authors contributed equally to this work. 

{These authors contributed equally to this work. 
{Corresponding author. Email: pejman@scripps.edu (P.M.); 
smontgom@stanford.edu (S.B.M.); ajbattle@jhu.edu (A.B.) 
Cite this article as: N. M. Ferraro et al., Science 369, 
eaaz5900 (2020). DOI: 10.1126/science.aaz5900 
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Transcriptomic signatures across human tissues 
identify functional rare genetic variation 


Nicole M. Ferraro, Benjamin J. Strober2*, Jonah Einson*“, Nathan S. Abell®, Francois Aguet®, 
Alvaro N. Barbeira’, Margot Brandt**, Maja Bucan®, Stephane E. Castel**, Joe R. Davis?°, 

Emily Greenwald®, Gaelen T. Hess°, Austin T. Hilliard“, Rachel L. Kember®, Bence Kotis’, 

YoSon Park’, Gina Peloso’*, Shweta Ramdas’, Alexandra J. Scott!®, Craig Smail’, Emily K. Tsang?°, 
Seyedeh M. Zekavat!®, Marcello Ziosi*, Aradhana®, TOPMed Lipids Working Group, Kristin G. Ardlie®, 
Themistocles L. Assimes“””, Michael C. Bassik®, Christopher D. Brown®, Adolfo Correa”®, Ira Hall?°, 


Hae Kyung Im’, Xin Li°"°, Pradeep Natarajan7°+22, GTEx Consortium, Tuuli Lappalainen*®, 
Pejman Mohammadi*?2?++, Stephen B. Montgomery®°++, Alexis Battle22+;+ 


Rare genetic variants are abundant across the human genome, and identifying their function and 
phenotypic impact is a major challenge. Measuring aberrant gene expression has aided in identifying 
functional, large-effect rare variants (RVs). Here, we expanded detection of genetically driven 
transcriptome abnormalities by analyzing gene expression, allele-specific expression, and alternative 
splicing from multitissue RNA-sequencing data, and demonstrate that each signal informs unique classes 
of RVs. We developed Watershed, a probabilistic model that integrates multiple genomic and 
transcriptomic signals to predict variant function, validated these predictions in additional cohorts and 
through experimental assays, and used them to assess RVs in the UK Biobank, the Million Veterans 
Program, and the Jackson Heart Study. Our results link thousands of RVs to diverse molecular effects 
and provide evidence to associate RVs affecting the transcriptome with human traits. 


Background 

The human genome contains tens of thou- 
sands of rare [minor allele frequency (MAF) 
<1%] variants (), some of which contribute to 
rare and common disease risks (2, 3). How- 
ever, identifying functional rare variants (RVs), 
especially in the noncoding genome, remains 
difficult because of their low frequency and 
the lack of a regulatory genetic code. Outlier 
gene expression aids in identifying functional, 
large-effect RVs (4-8). Furthermore, transcrip- 
tome sequencing provides diverse measure- 
ments beyond gene expression level, including 
allele-specific expression (ASE) and alterna- 
tive splicing, that have yet to be systematically 
evaluated and integrated into variant effect 
prediction (9-11). 

Using 838 samples with both whole-genome 
and transcriptome samples in the Genotype- 
Tissue Expression (GTEx) project version 8 (v8), 
we assessed how rare genetic variants con- 
tribute to outlier patterns in total expression 


(hereafter referred to simply as “expression”), 
allelic expression, and alternative splicing deep 
into the allele frequency (AF) spectrum. We in- 
tegrated these three transcriptomic signals across 
49 tissues, along with diverse genomic annota- 
tions to prioritize high-impact RVs, and assessed 
their relationship to complex traits in the UK 
Biobank (UKBB) (2), the Million Veterans Pro- 
gram (MVP) (73), and the Jackson Heart Study 
(JHS) 4). We further identified dozens of 
candidate RVs influencing well-studied disease 
genes, including APOE, FAAH, and MAPK3. 


Results 
Detection of aberrant gene expression across 
multiple transcriptomic phenotypes 


We quantified three transcriptional pheno- 
types for each gene to capture a wide range 
of functional effects caused by regulatory ge- 
netic variants. Briefly, to identify expression 
outliers (eOutliers), we generated Z scores from 
corrected expression data per tissue to deter- 


mine whether a gene in an individual has ex- 
tremely high or low expression (fig. S1) (75, 16). 
To identify genes with excessive allelic imbal- 
ance [allele-specific expression (ASE) outliers 
(aseOutliers)] we used ANEVA-DOT (analysis 
of expression variation-dosage outlier test; 
figs. S2 and S3) (16, 17). This method uses es- 
timates of genetic variation in dosage of each 
gene in a population to identify genes for 
which an individual has a heterozygous var- 
iant with an unusually strong effect on gene 
regulation (17). Splicing outliers (sOutliers) 
were detected using SPOT (splicing outlier 
detection), an approach introduced here that 
fits a Dirichlet-Multinomial distribution di- 
rectly to counts of reads split across alterna- 
tively spliced exon-exon junctions for each 
gene. SPOT then identifies individuals that 
deviate significantly from the expectation on 
the basis of this fitted distribution (figs. S4 to 
S6) (16). Each of the three methods was ap- 
plied across all GTEx samples. An individual 
was called a multitissue outlier for a given 
gene if its median outlier statistic across all 
measured tissues exceeded a chosen thresh- 
old (Fig. 1A) (16). Using this multitissue ap- 
proach for each phenotype, we found that each 
individual had a median of four eOutlier, four 
aseOutlier, and five sOutlier genes. 


Genes with aberrant expression, ASE, 
and splicing are enriched for functionally 
distinct RVs 


We observed that multitissue outliers for any 
of the three transcriptomic phenotypes were 
significantly more likely to carry a RV (MAF 
<1%) in the gene body or +10 kb than indi- 
viduals without outliers, assessed among 714 
individuals with European ancestry. These 
enrichments were progressively more pro- 
nounced for rarer variants and were stronger 
for structural variants (SVs) than for single- 
nucleotide variants (SNVs) and indels (Fig. 
1B). These trends were not reliant on the spe- 
cific choice of the threshold used to define out- 
liers (figs. S7 and S8). 

We found only 35 cases in which an indi- 
vidual gene was a multitissue outlier for all 
three transcriptional phenotypes. All but one 
of these had a nearby RV, and most were an- 
notated as splice variants. Among genes that 
were outliers for two transcriptional phenotypes 
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Fig. 1. Enrichment of RVs underlying aberrant expression, splicing, and 
ASE. (A) RNA-seq data in 838 individuals were combined across 49 tissues and 
used to identify shared tissue expression, ASE, and alternative splicing outliers. 
(B) Relative risk of new (not in gnomAD), singleton, doubleton, rare (MAF 
<1%), and low-frequency (MAF 1 to 5%) variants in a 10-kb window around 
the outlier genes across all data types compared with nonoutlier individuals for 
the same genes. Outliers were defined as those with values >3 SDs from the 
mean (|median Z| > 3) or, equivalently, a median P < 0.0027. Bars represent the 
95% confidence interval. (©) Assigning each outlier its most consequential 


in an individual (n = 465), the greatest overlap 
occurred between aseOutliers and eOutliers 
(n = 319; fig. S9A). We found that aseOutliers 
with modest expression changes (1 < |median 
Z| < 3) showed stronger enrichment for near- 
by RVs than those without any expression 
change (fig. S9), highlighting an important 
benefit of combining these phenotypes to dis- 
cover diverse RV effects. We found that genes 
for which no outlier individuals were identi- 
fied were enriched for Gene Ontology biological 
process terms relating to sensory perception 
and detection of chemical stimuli for all out- 
lier types (fig. S10) (16), which is consistent 
with enrichments seen for genes that do not 
have any cis-expression quantitative trait loci 
(eQTLs) discovered in GTEx (8). 

We found that different types of genetic 
variants contribute to outliers for the three 
molecular phenotypes, although rare splice 
donor variants were enriched near all outlier 
types (Fig. 1C). The largest differences in var- 
jant type enrichment among the three outlier 
types were copy number variations (CNVs) 
and duplications, which were almost exclu- 
sively associated with eOutliers, and splice 
acceptor variants, which were enriched con- 
siderably more within sOutliers (fig. S11). 

For all phenotypes, the proportion of out- 
liers with a nearby RV of any category in- 
creased with threshold stringency (Fig. 1D). 
For eOutliers, aseOutliers, and sOutliers, at 
the strictest threshold of median outlier P < 
1.1 x 10°’, most individuals were carrying at 
least one RV nearby the outlier gene (82 to 
94%). When looking further at RVs with func- 
tional annotations (from the annotations 
listed in Fig. 1C), we found that underex- 
pressed eOutliers were the most interpretable, 
with 88% of outlier-associated RVs having 
an additional functional annotation, whereas 
aseOutliers had the lowest proportion at 56% 
(Fig. 1D). This analysis provides further insight 
into expectations for causal RV types when an 
outlier effect of a specific magnitude is ob- 
served in an individual. 

Conversely, a large proportion of genes with 
nearby rare genetic variants did not appear 
as outliers, even for the most predictive classes 
such as loss-of-function variants. The largest 
proportion of variants leading to any outlier 
status were rare splice donor and splice ac- 
ceptor variants, of which only 7.2 and 6.8%, 
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duplication. (E) Proportion 


respectively, led to an sOutlier (Fig. 1E and fig. 
S11). Overall, whereas some transcriptomic ef- 
fects may have been missed, the low frequency 
with which RVs of these classes led to large 
transcriptome changes reinforces the utility 
of incorporating functional data in variant 
interpretation even for specific variant classes 
already used in clinical interpretation. 


Genomic position of RVs predicts 
the impact on expression 


Although we primarily assessed RVs that oc- 
cur either within an outlier gene or in a 10-kb 
surrounding window, gene regulation can 
occur at greater distances (19, 20). Because 
we observed the strongest enrichments for 
the lowest-frequency variants, we intersected 
singleton variants [(SVs); i.e., those appearing 
only once in GTEx and SNVs and/or indels 
that do not appear in the Genome Aggrega- 
tion Database (gnomAD) (21)] with 200-kb- 
length windows exclusive of other windows 
and upstream from outlier genes and com- 
pared their frequency in outlier versus non- 
outlier individuals. SNV enrichments dropped 
off quickly at greater distances from the gene 
but remained weakly enriched for eOutliers 
out to 200 kb. The same was true for rare indels, 
with enrichment at 200 kb only for sOutliers. 
SVs remained enriched at much longer dis- 
tances, being enriched 2.33-fold as far as 800 kb 
to 1 Mb upstream and up to 600 kb down- 
stream of the gene body (Fig. 2A and fig. S12A). 

RVs in promoter regions have been previ- 
ously linked to outlier expression (5, 15). To 
extend these observations and to assess the 
types of transcription factor (TF)-binding 
sites that could lead to outliers, we tested en- 
richment of rare transcription start site (TSS) 
proximal variants in specific TF motifs near 
under- and over-eOutliers. For under-eOutliers, 
we saw an enrichment of variants in GABP, a 
TF that activates genes that control the cell 
cycle, differentiation, and other critical func- 
tions (22). For over-eOutliers, we saw an en- 
richment of RVs intersecting the E2F4 motif, 
a TF that has been reported as a transcrip- 
tional repressor (23). In both under- and over- 
eOutliers, we saw RVs in YY/, which can act as 
either an activator or repressor, depending on 
context (24), and has been associated with 
GABP in coregulatory networks (Fig. 2B and 


fig. S12B) (25). Thus, these naturally occurring 
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nearby RV, the relative risk for different categories of RVs falling within 10 kb 
of each outlier type. The inset panel shows enrichments for a subset of 
variant categories on a log(2)-transformed y-axis scale for better visibility. 

(D) Proportion of outliers at a given threshold that have a nearby RV in the given 
category. eOutlier |median Z scores| were converted to P values using the 
cumulative probability density function for the normal distribution. TE, 
transposable element; INV, inversion; BND, break end; DEL, deletion; DUP, 


of RVs in a given category that lead to an outlier at a 


P-value threshold of 0.0027 across types. 


RV perturbations can provide information 
about how specific TFs can strongly up- or 
down-regulate their target genes. 


RVs can affect multiple genes and lead 
to new gene fusion 


We observed that RVs can also affect multiple 
genes in an individual. We found a strong en- 
richment for multigene effects among eOutliers 
and, to a lesser degree, aseOutliers (fig. S13). 
As expected, we did not see enrichment for 
nearby sOutlier pairs, which are less subject to 
coregulation (26). Within a 100-kb window, 
neighboring eOutlier genes were 70 times more 
frequent than would be expected by chance 
if drawing outlier pairs at random. They were 
also significantly enriched for rare CNVs, du- 
plications, and TSS variants nearby one or 
both genes compared with individuals who 
had outlier expression but for only one of the 
genes (fig. S13). We also found that rare SV 
enrichments were present near eOutliers re- 
gardless of whether the SV overlapped the 
gene itself (fig. S14). We observed 27 examples 
of rare SVs, including deletions, duplications, 
and break ends, associated with eOutliers in at 
least two genes in the same individual (fig. S15 
and table S1). For one of these, we observed 
evidence of a fusion transcript resulting from a 
deletion spanning the end of the gene SPTBN1 
and the TSS of EML6. This deletion led to un- 
derexpression of SPTBN1 (median Z score = 
-4.67) and overexpression of EML6 (median 
Z score = 8.12) compared with all other in- 
dividuals. Supporting the presence of a new 
germline fusion transcript, we found evidence 
of a specific transcript spanning both SPTBN1 
and EML6 in multiple tissues for the individ- 
ual with the deletion (fig. S16). For both of these 
genes, this individual also showed sOutlier 
signal (median SPOT P = 0.0005 for EML6 
and 0.0035 for SPTBN1). The identification of 
fusion transcripts has been of particular inter- 
est in cancer diagnosis and prognosis (27-30), 
and both EML genes and SPTBNI have been 
previously implicated in cancer-associated 
fusions (37, 32). 


RVs in splicing consensus sequence drive 
splicing outliers 


Previous studies have shown RVs disrupting 
splice sites result in outlier alternative splicing 
patterns (33, 34). We used sOutlier calls made 
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Fig. 2. RV enrichments in specific genomic positions. (A) Relative risk of 
SNVs and indels (not found in gnomAD), and SVs (singleton in GTEx) at varying 
distances upstream of outlier genes (bins exclusive) across data types. (B) Proportion 
of eOutliers with TSS RVs in promoter motifs within 1000 bp. Under and over bins 
are defined with a median Z score threshold of 3, and controls are all individuals 
with a median Z score <3 for the same set of outlier genes. (©) Graphic 
summarizing positional nomenclature relative to observed donor and acceptor 
splice sites. (D) Relative risk (y-axis) of an sOutlier (median LeafCutter cluster 

P <1x10°°) RV being located at a specific position relative to the splice site 


(x-axis) compared with nonoutlier RVs. Relative risk ca 


for each LeafCutter cluster (16, 35) to assess 
enrichment of splicing-related variants more 
precisely. We observed extreme enrichment of 
RVs near splice sites in sOutliers. An sOutlier 
was 333 times more likely than a nonoutlier to 
harbor a RV within a 2-bp window around a 
splice site (fig. S17A) (16), with signal decaying 
at greater distances but still enriched up to 


100 bp away (relative risk = 7.43). To obtain 
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culation was done 


base pair resolution enrichments, we computed 
the relative risk of sOutlier RVs located at specific 
positions relative to observed donor and ac- 
ceptor splice sites (16). Ten positions near the 
splice site showed significant enrichment for 
RVs in sOutliers compared with controls (Fig. 2, 
C and D). These positions corresponded pre- 
cisely to positions that have also been shown 
to be intolerant to mutations because of their 
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separately for donor and acceptor splice sites. (E) Independent position weight 
matrices showing mutation spectra of sOutlier (median LeafCutter cluster 

P <1x10°°) RVs at positions relative to splice sites with negative junction usage 
(i.e., splice sites used less in outlier individuals than in nonoutliers). (F) Junction 
usage of a splice site is the natural log of the fraction of reads in a LeafCutter 
cluster mapping to the splice site of interest in sOutlier (median LeafCutter cluster 
P <1 10°) samples relative to the fraction in nonoutlier samples aggregated 
across tissues by taking the median (16). Junction usage (y-axis) of the closest 
splice sites to RVs that lie within a polypyrimidine tract (A — 5, A — 35) binned 
by the type of variant (x-axis). 


conserved role in splicing (we will refer to 
these positions as the splicing consensus se- 
quence) (34). Among the most enriched posi- 
tions within the splicing consensus sequence 
were the four essential splice site positions 
(D+1,D+2,A-2, A -1) (36), which showed 
an average relative risk of 195. 

sOutliers further captured the transcrip- 
tional consequences both for variants that 
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disrupted a reference splicing consensus se- 
quence and those that created a new splicing 
consensus sequence. Individuals with sOutlier 
variants in which the rare allele deviated away 
from the splicing consensus sequence showed 
decreased junction usage of the splice site near 
the variant, whereas individuals with variants 
in which the rare allele created a splicing con- 
sensus sequence showed increased junction 
usage of the splice site near the variant rela- 
tive to nonoutliers (Fig. 2E and figs. S17B and 
S18) (16). We saw a related enrichment pattern 
after separating annotated and new (unan- 
notated) splice sites (fig. S19). sOutliers were 
also enriched for RVs positioned within the 
polypyrimidine tract (PPT), a highly conserved, 
pyrimidine-rich region, ~5 to 35 bp upstream 
from acceptor splice sites (37). A RV was 
6.25 times more likely to be located in the PPT 
near an sOutlier relative to a nonoutlier. sOutliers 
with a RV that changed a position in the PPT 
from a pyrimidine to a purine (i.e., disrupting 
an existing PPT) showed decreased junction 
usage of the splice site near the variant, where- 
as the inverse was true for variants that changed 
a position in the PPT from a purine to pyri- 
midine (Fig. 2F and fig. S20). 


RVs in tissue-specific regulatory regions can 
lead to tissue-specific outlier expression 


Although multitissue outliers offer improved 
power to detect RV effects, we also evaluated 
RVs from outliers detected in individual tis- 
sues. Single-tissue measurements are subject 
to greater variation than repeat measurements 
across tissues but are representative of most 
experimental designs. First, we performed rep- 
lication analysis across all individuals with 
data available for the three methods to evaluate 
the degree to which outlier status detected in 
one tissue of an individual was replicated in 
other tissues (J6). On average, we found that 
eOutlier, aseOutlier, and sOutlier status in a 
discovery tissue was detected in a test tissue 
5.1, 10.7, and 8.7% of the time, respectively (Fig. 
3A and fig. $21). This is consistent with other 
findings that measurements of ASE are more 
consistent across tissues (18). Considering clin- 
ically accessible tissues, namely whole blood, 
fibroblasts, and lymphoblastoid cells, if we 
consider outliers observed for a gene in at least 
two of these tissues in the same individual, we 
saw average replication rates across all other 
tissues of 14.1, 20.9, and 15.0% for eOutliers, 
aseOutliers, and sOutliers, respectively (fig. S22). 
Both the higher replication rate for aseOutliers 
and the increase in outlier replication in non- 
accessible tissues when considering more than 
one accessible measurement are informative 
for the analysis of functional data from easily 
accessible tissues to understand disease states 
most relevant to other tissues. 

We next evaluated the ability of single-tissue 
outliers from each method to prioritize RVs 
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near outlier genes. Single-tissue aseOutliers 
were most enriched for nearby RVs, followed 
by sOutliers and then eOutliers, across all out- 
lier cutoff thresholds (Fig. 3B and fig. S21 and 
$23A). We also observed enrichment of var- 
jants likely triggering nonsense-mediated decay 
among single-tissue eOutliers, aseOutliers, and 
sOutliers (Fig. 3C and fig. S23B). Additionally, 
we found that single-tissue sOutliers still showed 
strong enrichment for RVs in the splicing con- 
sensus sequence and the PPT (fig. S24). 

Except for rare SVs that notably were en- 
riched at comparable thresholds to multitissue 
eOutliers, single-tissue eOutliers show far weaker 
enrichments relative to multitissue outliers for 
nearby rare SNVs and indels across all thresh- 
olds (fig. S25). To improve discovery of tissue- 
specific outliers, we leveraged the breadth of 
tissue data available and used observed pat- 
terns of correlation across tissues to detect out- 
liers that deviate from the expected covariance 
of expression in a subset of tissues (16). A sim- 
ilar approach has been implemented to identify 
functional RVs on the basis of the correlation 
of expression among genes in a single tissue 
(5). We found that outliers identified using 
this approach were often driven by expression 
changes in one or a few tissues compared with 
multitissue eOutliers based on median Z scores 
(Fig. 3D). The correlation tissue-specific outliers 
were also enriched for nearby RVs in a 10-kb 
window around the gene (fig. S26C). However, 
these outliers were also enriched for RVs in 
enhancers that were active in the tissue(s) 
driving the outlier effect (table S2), as deter- 
mined by single-tissue Z score and within a 
500-kb window around the gene (Fig. 3E). No- 
tably, these tissue-specific outliers were depleted 
for rare variation in enhancers annotated in 
other, unmatched tissues. 


Prioritizing RVs by integrating genomic 
annotations with diverse personal 
transcriptomic signals 


To incorporate diverse transcriptome signals 
into a method to prioritize RVs, we developed 
Watershed, an unsupervised probabilistic 
graphical model that integrates information 
from genomic annotations of a personal ge- 
nome (table $3) with multiple signals from a 
matched personal transcriptome. Watershed 
provides scores that can be used for personal 
genome interpretation or for cataloging po- 
tentially impactful rare alleles, quantifying the 
posterior probability that a variant has a func- 
tional effect on each transcriptomic phenotype 
based on both whole-genome-sequencing (WGS) 
and RNA-sequencing (RNA-seq) signals (Fig. 4A). 
The Watershed model can be adapted to any 
available collection of molecular phenotypes, 
including different assays, different tissues, or 
different derived signals. Further, Watershed 
automatically learns Markov random field 
(MRF) edge weights reflecting the strength of 
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the relationship between the different tissues 
or phenotypes included that together allow the 
model to predict functional effects accurately. 

We first applied Watershed to the GTEx v8 
data using the three outlier signals examined 
here, expression, ASE, and splicing (Fig. 4A) 
(16), for which each was first aggregated by 
taking the median across tissues for the corre- 
sponding individual. In agreement with ex- 
isting evidence of similarity between outlier 
signals (fig. S9), the learned Watershed edge 
parameters were strongest between ASE and 
expression, followed by ASE and splicing, but 
strictly positive for all pairs of outlier signals 
(.e., each outlier signal was informative of all 
other signals; Fig. 4B). To evaluate our model, 
we used held-out pairs of individuals that shared 
the same RV, making Watershed predictions 
in the first individual and evaluating those 
predictions using the second individual’s out- 
lier status as a label (15, 16). Watershed out- 
performs methods based on genome sequence 
alone [our genomic annotation model (GAM) 
and combined annotation-dependent deple- 
tion (CADD); Fig. 4C and fig. S27] (38, 39). 
We also compared performance of Watershed 
with RIVER [RNA-informed variant effect on 
regulation (15)], a simplification of the Water- 
shed model in which each outlier signal is 
treated independently. We found that explicitly 
modeling the relationship between different 
molecular phenotypes provided a perform- 
ance gain for Watershed (Fig. 4D, figs. S28 
and S29, and table S4) (16). We observed that 
even the most predictive genomic annota- 
tions only resulted in eOutliers, aseOutliers, 
and sOutliers 2.8, 7.9, and 14.3% of the time, 
respectively (Figs. 1E and 4C). However, in- 
tegrating transcriptomic signals with genomic 
annotations from Watershed (at a posterior 
threshold of 0.9) detected SNVs that resulted 
in eOutliers, aseOutliers, and sOutliers with 
greater frequency 11.1, 33.3, and 71.4% of the 
time, respectively (Fig. 4C and fig. S30). 

We further extended the Watershed frame- 
work to prioritize variants on the basis of their 
predicted tissue-specific impact. We trained 
three “tissue-Watershed” models (one for each 
of expression, ASE, and splicing separately), 
in which each model considers the effects in 
all tissues jointly, sharing information in the 
MRF, and ultimately outputs 49 tissue-specific 
scores for each RV (figs. S29 and S31) (16). We 
observed that the parameters learned for each 
of the three tissue-Watershed models resem- 
bled known patterns of tissue similarity (Fig. 
4E and fig. S32) (18). Further, using held-out 
individuals, the tissue-Watershed model out- 
performed a RIVER model in which each tis- 
sue is treated completely independently (P = 
2.00 x 10°, P = 2.00 x 10°, and P = 5.90 x 10? 
for expression, ASE, and splicing, respectively; 
one-sided binomial test; Fig. 4F and figs. $33 
and S34) and a collapsed RIVER model trained 
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Fig. 3. Single-tissue outlier enrichments and replication. (A) Median 
replication of outliers identified per tissue across every other tissue for each 
outlier type. (B) Relative risk point estimate for nearby rare SNVs for outliers 
across all tissues individually. (C) Relative risk enrichments for likely gene 
disrupting RVs nearby single-tissue outliers at a threshold of |Z| > 4 (equivalently 
SPOT or ANEVA-DOT P < 0.000063), with one point per tissue. (D) Distribution 
of number of tissues with aberrant expression underlying expression outliers 


with single median outlier statistics (P = 0.0577, 
P = 0.251, and P = 0.00128 for expression, ASE, 
and spicing, respectively; one-sided binomial 
test; figs. S35 and 36). Critically, integrative 
models that incorporated transcriptomic sig- 
nal and genomic annotations from a single 
tissue still outperformed methods based only 
on genome sequence annotations (Fig. 4F), sup- 
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porting the benefit of collecting even a single 
RNA-seq sample to improve personal genome 
interpretation. 


Replication and experimental validation 
of predicted RV transcriptome effects 


We first assessed the replication of “candi- 
date causal RVs” previously identified by the 
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defined by median Z score (eOutliers) or Mahalanobis distance P value 
(correlation). (E) Relative risk of correlation outliers driven by a single tissue, 
defined as significant correlation outliers for which an expression change of the 
degree indicated by the point color is observed in only a single tissue (16) 
carrying a RV in enhancers annotated to that tissue within a 500-kb window of 
the outlier gene. Unmatched are defined as all tissue-specific enhancer regions 
regardless of outlier tissue. 


SardiNIA Project (6), using GTEx Watershed 
prioritization. Of five SardiNIA candidate 
causal RVs that were also present in a GTEx 
individual, four had high (>0.7) GTEx Water- 
shed expression posterior probabilities (table 
S5). Next, we tested replication of GTEx RVs, 
prioritized by Watershed, in an independent co- 
hort evaluating 97 whole-genome and matched 


6 of 13 


SPECIAL SECTION 


GENETIC VARIATION 


o 


Expression 


Precision 


Tissue 


Fig. 4. Prioritizing functional RVs with Watershed. (A) Graphic summarizing 
plate notation for the Watershed model when it is applied to three median outlier 
signals (expression, ASE, and splicing). (B) Symmetric heatmap showing learned 
Watershed edge parameters (weights) between pairs of outlier signals after training 
Watershed on three median outlier signals. (C) The proportion of RVs with Water- 
shed posterior probability >0.9 (right) and with GAM probability greater than a 
threshold set to match the number of Watershed variants for each outlier signal 
(left) that lead to an outlier at a median P-value threshold of 0.0027 across three 
outlier signals (colors). Watershed and GAM models were evaluated on held-out pairs 


transcriptome samples from the Amish Study 
of Major Affective Disorders (ASMAD) (40). 
We evaluated GTEx RVs also present in this 
cohort at any frequency, quantifying eOutlier, 
aseOutlier, and sOutlier signal in each ASMAD 
individual harboring one of the GTEx var- 
iants (16). For all three phenotypes, ASMAD 
individuals with variants having high (>0.8) 
Watershed posterior probability based on 
GTEx data had significantly more extreme 
outlier signals at nearby genes compared with 
individuals with variants having low (<0.01) 
GTEx Watershed posterior probability (ex- 
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pression: P = 2.729 x 10°°, ASE: P = 2.86 x 
10-?, and splicing: P = 5.86 x 10°"; Wilcoxon 
rank-sum test; fig. S37). Every variant with a 
high GTEx Watershed splicing posterior prob- 
ability (>0.8) resulted in an sOutlier (P < 0.01) 
in the ASMAD cohort. Furthermore, ASMAD 
individuals with variants having high (>0.8) 
GTEx Watershed posterior probability had 
significantly larger outlier signals relative to 
equal size sets of variants prioritized by GAM 
(expression: P = 0.00129, ASE: P = 0.0287, 
and splicing: P = 0.00058; Wilcoxon rank- 
sum test; fig. S37). Overall, RVs prioritized 
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of individuals. (D) Precision-recall curves comparing performance of Watershed, 
RIVER, and GAM (colors) using held-out pairs of individuals for three median outlier 
signals. (E) Symmetric heatmap showing learned tissue-Watershed edge parameters 
(weights) between pairs of tissue outlier signals after training tissue-Watershed 

on eOutliers across single tissues. Tissue color to tissue name mapping can be found 
in fig. S21D. (F) Area under precision recall curves [AUC(PR); y-axis] in a single 
tissue between tissue-GAM, tissue-RIVER, and tissue-Watershed (x-axis) when 
applied to outliers across single tissues in all three outlier signals (colors). Precision 
recall curves in each tissue were generated using held-out pairs of individuals. 


by Watershed using GTEx data displayed 
evidence of functional effects in ASMAD 
individuals. 

We further applied both a massively parallel 
reporter assay (MPRA) and a CRISPR-Cas9 
assay to assess the impact of Watershed- 
prioritized RVs. We experimentally tested the 
regulatory effects of 52 variants with mod- 
erate Watershed expression posterior (20.5) 
and 98 variants with low Watershed expres- 
sion posterior (<0.5) using MPRA (16). We 
observed increased effect sizes for RVs with 
high Watershed expression posterior relative 
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to variants with low expression posterior (P = 
0.025; one-sided Wilcoxon rank-sum test; 
fig. S38 and table S6). Next, we assessed the 
functional effects of 20 variants by editing 
them into inducible-Cas9 293T cell lines. These 
included 14 rare stop-gained variants and six 
non-eQTL common variants as negative con- 
trols. Of the 14 rare stop-gained variants, 13 
had expression or ASE Watershed posterior 
>0.8, with the remaining variant [previously 
tested in (41)] having a posterior of 0.22. All 
control variants had Watershed posteriors 
<0.03. Of the 13 variants with a Watershed 
posterior >0.8, 12 showed a significant de- 
crease in expression of the rare allele (P < 
0.05, Bonferroni corrected; fig. S39 and table 
87) (16). 


Aberrant expression informs 
RV trait associations 


We found that each individual had a median 
of three eOutliers, aseOutliers, and sOutliers 
(median outlier P < 0.0027) with a nearby 
RV. When filtering by moderate Watershed 
posterior probability (>0.5) of affecting ex- 
pression, ASE, or splicing, individuals had a 
median of 17 genes with RVs predicted to 
affect expression, 27 predicted to affect ASE, 
and nine predicted to affect splicing (Fig. 5A). 
From the set of outlier calls, we found mul- 
tiple instances of RVs influencing well-known 
and well-studied genes, including APOE and 
FAAH (table S8). In particular, for APOE, which 
has been associated with numerous neuro- 
logical diseases and psychiatric disorders 
(42), we found two aseOutlier individuals both 
carrying a rare, missense variant, rs563571689, 
with ASE Watershed posteriors >0.95, not pre- 
viously reported. For FAAH, which has been 
linked to pain sensitivity in numerous con- 
texts (43, 44), we found two eOutlier individ- 
uals with a rare 5’ untranslated region variant, 
1s200388505, with ASE and expression Water- 
shed posteriors >0.9. 

To assess whether identified rare functional 
variants from GTEx associate with traits, we 
intersected this set with variants present in 
the UKBB (12). We focused on a subset of 34 
traits for which GWAS association for a UKBB 
trait had evidence of colocalizations with eQTLs 
and/or alternative splicing QTLs (sQTLs) in 
any tissue (table S9) (16, 45). GTEx has demon- 
strated that genes with RV associations for a 
trait are strongly enriched for their eQTLs 
colocalizing with GWAS signals for the same 
trait (78), indicating that QTL evidence can be 
used to guide RV analysis. Furthermore, RVs 
near GTEx outliers had larger trait associa- 
tion effect sizes than background RVs near 
the same set of genes in the UKBB data (P = 
3.51 x 10 °; one-sided Wilcoxon rank-sum test), 
with a shift in median effect size percentile 
from 46 to 53%. Notably, outlier variants that 
fell in or nearby genes with an eQTL or sQTL 
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colocalization had even larger effect sizes (me- 
dian effect size percentile 88%) than non- 
outlier variants (P = 1.93 x 10°°; one-sided 
Wilcoxon rank-sum test) or outlier variants fall- 
ing near any gene not matched to a colocaliz- 
ing trait (P = 4.88 x 10°°; one-sided Wilcoxon 
rank-sum test; Fig. 5B). 

Although most variants tested in UKBB had 
low Watershed posterior probabilities of af- 
fecting the transcriptome (fig. S40A), we hy- 
pothesized that filtering for those variants 
that do have high posteriors would yield var- 
iants in the upper end of the effect size dis- 
tribution for a given trait. For each variant tested 
in UKBB, we took the maximum Watershed 
posterior per variant and compared this with 
a genomic annotation-defined metric, CADD 
(38, 39). We found that Watershed posteriors 
were a better predictor of variant effect size 
than CADD scores for the same set of RVs in 
a linear model (Table 1). Across different 
Watershed posterior thresholds, we found that 
the proportion of variants falling in the top 
25% of RV effect sizes in colocalized regions 
exceeded the proportion expected by chance 
(Fig. 5C). Whereas filtering by CADD score 
did return some high effect size variants, this 
proportion declined at the highest thresholds 
(fig. S40D). Furthermore, there was very little 
overlap between variants with high Watershed 
posteriors and high CADD variants (fig. S40D), 
with CADD variants more likely to occur in 
coding regions and Watershed variants more 
frequent in noncoding regions (fig. S40D). 
Thus, the approaches largely identified dis- 
tinct and complementary sets of variants for 
these traits. 

We identified 33 rare GTEx variant trait 
combinations in which the variant had a Water- 
shed posterior >0.5 and fell in the top 25% of 
variants by effect size for the given trait (table 
S10). We highlight two such examples, for 
asthma and high cholesterol (Fig. 5, D and E), 
showing that although RVs usually do not 
have the frequency to obtain genome-wide sig- 
nificant P values, when they are prioritized 
by the probability of affecting expression, we 
could identify those with greater estimated 
effect sizes on the trait (table S11). In the case 
of asthma, the RV effect sizes in UKBB were 
three times greater than the lead colocalized 
variant. These variants included rs146597587, 
which is a high-confidence loss-of-function 
splice acceptor with an overall gnomAD AF of 
0.0019, and rs149045797, an intronic variant 
with a frequency of 0.0019, both of which were 
associated with the gene JZ33, the expres- 
sion of which has been implicated in asthma 
(46, 47). Previous work has identified the pro- 
tective association between rsl46597587 and 
asthma (48, 49), and we found that this is po- 
tentially mediated by outlier allelic expres- 
sion of /L.33 leading to moderate decreases in 
total expression, with median Z scores rang- 
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ing from -1.08 to -1.77 in individuals with the 
variant, and median single-tissue Z scores 
across the six individuals exceeding -2 in 
10 tissues. An asthma association had also been 
reported recently for the other high Watershed 
asthma-associated variant rs149045797 and 
was in perfect linkage disequilibrium with 
18146597587 (50). An additional high Watershed 
variant, rs564796245, an intronic variant in 
TTC38 with a gnomAD AF of 0.0003, had a 
high effect size for self-reported high choles- 
terol in the UKBB but was not previously 
reported. We were able to test this variant 
against four related blood lipids traits in the 
MVP (52). We found that for these traits, which 
included high-density lipoprotein (HDL), low- 
density lipoprotein, total cholesterol, and tri- 
glycerides, among rare (gnomAD AF <0.1%) 
variants within a 250-kb window of rs564796245, 
this variant was in the top 5% of variants by 
effect size; for HDL specifically, it was in the 
top 1% (fig. S41). We also assessed this var- 
iant’s association with the same four traits in 
the JHS (14), an African American cohort in 
which four individuals carried the RV. Here, 
we found that the direction of effect was con- 
sistent with MVP and UKBB for all four traits 
(tables S11 and S12), and the variant fell in the 
top 28th to 38th percentile of all rare (gnomAD 
AF <0.1%) variants in this region (fig. S42). 
Only four of the variants tested in UKBB had 
Watershed posterior probabilities >0.9 for 
colocalized genes, but of those, three showed 
high effect sizes for a relevant trait (table S10). 


Discussion 


RVs are abundant in human genomes, yet 
they have remained difficult to study system- 
atically. Using multitissue transcriptome and 
whole-genome data from GTEx v8, we have 
been able to identify and assess the properties 
of RVs, including SVs, that underlie extreme 
changes in expression, alternative splicing, 
and ASE. 

We observed that each signal informs dis- 
tinct classes of RVs, demonstrating the bene- 
fit of integrating multiple sources of personal 
molecular data to improve variant interpre- 
tation. We expanded characterization of the 
properties of RVs in multiple contexts, in- 
cluding structural variants affecting multiple 
genes, rare splice variants that disrupt or cre- 
ate splicing consensus sequences, and RVs 
occurring in tissue-specific enhancers leading 
to tissue-specific eOutliers. Together, these 
provide a map of the properties of large-effect 
RVs, aiding their identification and evaluation 
in future studies. We note that although our 
approach can be used to identify some large- 
effect RVs underlying disease, it is unlikely to 
capture the full spectrum of functional RVs 
contributing to heritability because some ef- 
fects will not manifest as clear transcriptome 
aberrations (8). 
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Fig. 5. Trait associations for RVs underlying outlier genes. (A) Distribution of 
the number of outlier genes, outlier genes with a nearby RV, and genes with a 
high Watershed posterior variant per data type. We added one to all values so 
that individuals with O are included. (B) Distribution of effect sizes, transformed 


an equal number that also fall in these regions over 1000 iterations (black). 
(D) Manhattan plot (top) across chromosome 9 for asthma in the UKBB, filtered 
for non-low-confidence variants, with two high-Watershed variants, rs149045797 
and rsl46597587, shown in pink and the lead colocalized variant, rs3939286, 


to a percentile, for the set of GTEX RVs that appear in UKBB and are not outlier 
variants, those that are outlier variants, and those outlier variants that fall in 
colocalizing genes for the matched trait across 34 traits. Percentiles were 


nn WwW 


hown in blue. The variants’ effect size ranks were similarly high for both 
elf-reported and diagnosed asthma, but the summary statistics are shown for 
sthma diagnosis here. The UKBB MAF versus absolute value of the effect 


calculated on the set of rare GTEx variants that overlap UKBB. The set of genes __ size for all variants within 10 kb of the Watershed variant is also shown (bottom). 


was restricted to those with at least one outlier individual in any data type and 


a nearby variant included in the test set (4787 variants 


were calculated from a one-sided Wilcoxon rank-sum test. (C) Proportion of 
variants filtered by Watershed posterior that fell in the top 25% of effect sizes for 


a colocalized trait (red) and the proportion of random 


We further developed a probabilistic model 
for personal genome interpretation, Watershed, 
which improves standard methods by inte- 
grating multiple transcriptomic signals from 
the same individual. Relevant to ongoing ef- 
forts to identify RVs affecting human traits, 
we found that in RVs within trait-colocalized 
regions, filtering by Watershed posteriors can 


identify variants with larger trait effect sizes 
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and 1323 genes). P values 


ly selected variants of shown (bottom). 

better than relying on genomic annotations 
alone. As further demonstrated by our discov- 
ery of outlier RVs in well-studied disease genes, 
application of Watershed and other integra- 
tive methods will prove increasingly helpful 
for cataloging and prioritizing RVs affecting 
traits, especially those at the lowest ends of 
the AF spectrum. Our results provide a means 
to improve the quality and extent of RV pri- 
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(E) Manhattan plot across chromosome 22 for self-reported high cholesterol in 
the UKBB, filtered to remove low confidence variants, with the high-Watershed 
variant rs564796245 shown in pink. The UKBB MAF versus absolute value 

of the effect size for all variants within 10 kb of the Watershed variant is also 


oritization, with potential future impacts en- 
hancing RV association testing and disease 
gene identification. 


Materials and methods summary 


Detailed materials and methods are available 
in the supplementary materials. Briefly, we 
used RNA-seq and WGS data from the v8 re- 
lease of the GTEx project, which contains 
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Table 1. Watershed and CADD as predictors of variant effect size percentile. Shown are the 
coefficient estimates and 95% confidence intervals from separate linear models with variant effect 
size percentile as the response and CADD score or Watershed posterior (scaled to have a mean 

of 0 and an SD of 1 so that values are of comparable range) as the predictor for all tested variants in 


colocalized regions (n = 5277). 


Predictor Beta 
Watershed posterior he 
CADD score 0.77 


49 biological tissues with at least 70 samples 
per tissue. 

For the set of RVs analyzed, we retained all 
SNVs and indels that passed quality control 
in the GTEx v8 variant call format file using 
the hg38 genome build. Structural variants 
were called on the subset of individuals avail- 
able in the GTEx v7 release. We defined RVs 
as those with <1% MAF within GTEx and, for 
SNVs and indels, also occurring at <1% fre- 
quency in non-Finnish Europeans within 
gnomAD (27). Annotation of protein-coding 
regions and TF-binding site motifs was gen- 
erated by running Ensembl VEP (v88). 

We next used the RNA-seq data to make 
outlier calls in each tissue. Briefly, we log2- 
transformed the expression values [log,(TPM + 
2)], where TPM is the number of transcripts 
per million mapped reads, restricted to lincRNA 
and protein-coding genes with at least six reads 
and TPM >0.1 in at least 20% of individuals. 
We scaled the expression of each gene to mean 
of O and a standard deviation of 1 to avoid 
the deflation of outlier values caused by quan- 
tile normalization. We corrected for hidden 
factors using PEER [probabilistic estimation 
of expression residuals (52)] to account for 
unmeasured technical confounders, as well as 
the top three genotype principal components, 
sex, and the genotype of the strongest cis- 
eQTL per gene in each tissue. We rescaled the 
residual values per gene and used the result- 
ing corrected Z scores to determine eOutliers. 

ASE outlier calls in a single tissue were made 
using ANEVA-DOT to identify genes in each 
individual that showed excessive allelic im- 
balance of ASE relative to the population. 
Briefly, ANEVA-DOT relies on tissue-specific 
estimates of genetic variation in gene dosage, 
V“, derived by ANEVA on a reference popula- 
tion’s ASE data to identify genes in individual 
test samples that are likely affected by RVs 
with unusually large regulatory effects. 

Splicing outlier calls were made in a single 
tissue using SPOT to identify genes in each 
individual that show abnormal splicing pat- 
terns. Briefly, For a given LeafCutter cluster in 
a given tissue, we defined a matrix, X (dim 
NxJ), where each row corresponds to one of NV 
samples, each column corresponding to one of 
J exon-exon junctions mapped to the LeafCutter 
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P value 95% confidence interval 
e108 D992 27 
2.41 x 10? 0.10-1.43 


cluster, and each element was the number of 
raw split read counts corresponding to that 
row’s sample and that column’s exon-exon 
junction. We were able to compute a P value 
representing how abnormal a given sample’s 
splicing patterns were for the given LeafCutter 
cluster as follows: 

1. Fitted parameters of Dirichlet-Multinomial 
distribution based on observed data X to cap- 
ture the distribution of split read counts map- 
ping to this LeafCutter cluster; 

2. Used the fitted Dichlet-Multinomial dis- 
tribution to compute the Mahalanobis dis- 
tance for each of the N samples; and 

3. Computed the Mahalanobis distance for 
1,000,000 samples simulated from the fitted 
Dirichlet-Multinomial distribution and used 
these 1,000,000 Mahalanobis distances as an 
empirical distribution to assess the significance 
of the N real Mahalanobis distances. 

To generate multitissue outlier calls for each 
gene and outlier type, we calculated an indi- 
vidual’s median outlier score across all tissues 
for which data were available, restricting the 
analysis to individuals with measurements in 
at least five tissues. To account for situations 
in which widespread extreme expression might 
occur in an individual because of nongenetic 
influences, we excluded individuals in whom 
the proportion of tested genes that were multi- 
tissue outliers, at a P-value threshold of 0.0027, 
exceeded 1.5 times the interquartile range of 
the distribution of proportion of outlier genes 
across all individuals. 

For the correlation-aware outlier calls, we 
determined a subset of individuals and tissues 
with <75% missingness, leading to 762 indi- 
viduals and 29 tissues. We imputed missing 
expression values to improve our estimate of 
the tissue-by-tissue covariance matrix per gene 
that would be used in outlier calling. We used 
K-nearest neighbors in the impute R package 
(53) with & = 200 to impute values for missing 
tissues per individual on a gene-by-gene basis. 
From the imputed matrix, we estimated the 
tissue covariance matrix for each gene. We 
calculated the Mahalanobis distance for each 
gene-individual pair and assigned a P value to 
each gene individual from the chi-squared dis- 
tribution, with degrees of freedom equal to the 
number of tissues available for that individual. 
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Watershed is a hierarchical Bayesian model 
that predicts the regulatory effects of RVs on 
a specific outlier signal based on the integra- 
tion of multiple transcriptomic signals along 
with genomic annotations describing the RVs. 
Watershed models instances of gene-individual 
pairs to predict the regulatory effects of RVs 
nearby the gene. The Watershed model for a 
particular gene-individual pair, assuming K out- 
lier signals, consists of three layers (Fig. 4A): 

1. A set of variables G, representing the P ob- 
served genomic annotations aggregated over 
all RVs in the individual that are nearby the gene. 

2. A set of binary latent variables Z = Z, ..., 
Zr representing the unobserved functional 
regulatory status of the RVs on each of the K 
outlier signals. 

3. A set of categorical nodes E = Fj, ..., Ex 
representing the observed outlier status of the 
gene for each of the K outlier signals. 

A fully connected conditional random field 
(CRF) is defined over variables Z given G. Var- 
iables E; are each connected only to the corre- 
sponding latent variable Z;. Specifically, the 
following conditional probability distributions 
together define the full Watershed model: 


e Z|G ~ CRF(a, By, ..., By, 9) 

e E,|Z;, ~ Categorical(o;,) V k €¢ K 
© o, ~ Dirichlet(C, ...,C) 

e B; ~ Normal(0, 4) 


where f, € R? \ k © K are the parameters 
defining the contribution of the genomic an- 
notations to the CRF for each outlier signal 
(k), a € R* are the parameters defining the 
intercept of the CRF for each outlier signal 
(Kk), 6 € R&S?) are the parameters defining 
the edge weights between pairs of outlier sig- 
nals, 6, V & € K are the parameters defining 
the categorical distributions of each outlier 
signal, and C and (A are hyperparameters of 
the model. 

For the CRISPR assay, we selected 14 rare 
stop-gained variants that were good candidates, 
eight of which passed quality control through 
(1) filtering to rare stop-gained variants with 
expression and ASE watershed posterior >0.9, 
(2) filtering to multitissue outlier status in 
both, and (3) keeping four remaining candi- 
dates that lie in complex trait genes and the 
next 10 with the highest individual outlier 
signal and Watershed posterior. Variants were 
tested using the polyclonal editing assay de- 
scribed in (41). Briefly, inducible-Cas9 293T 
cells were transfected with a guide RNA and 
a single-stranded homologous template spe- 
cific to each variant. After sequencing, the 
effect size was calculated as log,[(Alt/Ref in 
cDNA)/(Alt/Ref in gDNA)] (54). These re- 
sults were combined with six previously tested 
stop-gained and six non-eQTL control var- 
iants for which Watershed posteriors were 
available. 
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For the MPRA, we designed a set of syn- 


thetic DNA fragments by retrieving the ge- 
nomic sequence corresponding to a 150-bp 
window centered at each variant of interest 
for the set of eOutlier-associated RVs and 
controls. For each variant, a reference and al- 
ternative sequence was designed that corre- 
sponded to each allele. GM12878 cells were 
cultured, electroporated, and collected. MPRA 
plasmid library construction proceeded as 
described in (55). To assemble oligo-barcode 
pairings, we merged all paired-end reads using 
FLASH2 (56), requiring a minimum 10-bp over- 
lap to retain each pair. Sequences correspond- 
ing to genomic fragments were mapped using 
STAR (57) against a reference assembled using 
the designed oligo library sequences. To count 
reads per individual barcode sequence, we 
took raw single-end reads, extracted the 20-bp 
region corresponding to the random barcode, 
and counted the number of reads per indi- 
vidual sequence. Finally, to generate oligo-level 
read counts, we computed the sum of all bar- 
codes for each oligo within each sample. We 
used negative binomial regression with an in- 
teraction term, implemented using DESeq2 
(58), to identify significant allele-independent 
and allele-dependent regulatory effects. 


To connect outlier-associated RVs to traits, 


we assessed genome-wide association study 
(GWASO summary statistics from the UKBB 
phase 2, made available by the Neale labora- 
tory (www.nealelab.is/uk-biobank/). We sub- 
setted the variants, either genotyped or imputed, 
in UKBB phase 2 to those that also appeared 
in any GTEx individuals with a frequency of 
<1% in GTEx, resulting in 45,415 SNVs. We 
filtered the set of GTEx RVs in UKBB to those 
in trait-colocalized regions, defined as being in 
a colocalized gene or within a 10-kb window. 
Colocalization calls are detailed in (45). 
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Gravitational lenses in galaxy clusters 


he large mass of a galaxy cluster deflects light from background objects, a phenomenon 
known as gravitational lensing. The large-scale gravitational lens caused by the whole cluster 
can be modified by smaller-scale mass concentrations within the cluster, such as individual 
galaxies. Meneghetti et a/. examined these small-scale gravitational lenses in observations 
of 11 galaxy clusters. They found small lenses that were an order of magnitude smaller 
than would be expected from cosmological simulations. The authors conclude that there is an 
unidentified problem with either prevailing simulation methods or standard cosmology. —KTS 


Science, this issue p. 1347 


Coupling transcription 
and translation 


In bacteria, the rate of transcrip- 
tion of messenger RNA (mRNA) 
by RNA polymerase (RNAP) 

is coordinated with the rate of 
translation by the first ribosome 
behind RNAP on the mRNA. Two 
groups now present cryo-elec- 
tron microscopy structures that 
show how two transcription 
elongation factors, NusG and 
NusA, participate in this cou- 
pling. Webster et al. found that 


SCIENCE sciencemag.org 


NusG forms a bridge between 
RNAP and the ribosome when 
they are separated by mRNA. 
With shortened mRNA, NusG 
no longer links RNAP and the 
ribosome, but the two are ori- 
ented so that newly transcribed 
mRNA can enter the ribosome. 
Wang et al. provide further 
insight into the effect of MRNA 
length on the complex struc- 
tures. They also include NusA 
and show that the NusG-bridged 
structure is stabilized by NusA. 
—VV 

Science, this issue p. 1355, p.1359 


A Hubble Space. 
Telescope image of 
galaxy cluster 

Abell 370, which acts 
as alens and bénds 
light from distant stars 


The states of past climate 
Deep-sea benthic foraminifera 
preserve an essential record 

of Earth's past climate in their 
oxygen- and carbon-isotope 
compositions. However, this 
record lacks sufficient temporal 
resolution and/or age control in 
some places to determine which 
climate forcing and feedback 
mechanisms were most impor- 
tant. Westerhold et a/. present a 
highly resolved and well-dated 
record of benthic carbon and 
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oxygen isotopes for the past 
66 million years. Their recon- 
struction and analysis show that 
Earth’s climate can be grouped 
into discrete states separated by 
transitions related to changing 
greenhouse gas levels and the 
growth of polar ice sheets. Each 
climate state is paced by orbital 
cycles but responds to variations 
in radiative forcing in a state- 
dependent manner. —HJS 
Science, this issue p. 1383 


Keeping rhythm requires 
communication 


n mammals, daily cycles in 
physiology require the synchro- 
nized activity of circadian clocks 
in peripheral organs such as 
the liver, a hub of metabolism. 
Guan et al. generated mice with 
hepatocytes that lack two tran- 
scriptional repressors known to 
be essential for clock function. 
This experimental manipula- 
tion unexpectedly disrupted 
rhythmic gene expression 

and metabolism not only in 
hepatocytes but also in other 
liver cell types. Feeding behavior 
also coregulated circadian 
rhythms in multiple liver cell 
types. Cell-cell communication 
thus appears to be important 

in maintaining the robustness 
of peripheral circadian clocks. 
—PAK 


Science, this issue p. 1388 


Alethal combination 

It is well established that 
predators are essential for the 
structuring and maintenance 
of biotic communities. One of 
the first demonstrations of this 
importance came from studies 
of the importance of sea otters 
to the maintenance of kelp 
forests. Rasher et al. now show 
that the effects caused by the 
absence of this predator can be 
further exacerbated by climate 
warming. In North Pacific kelp 
forests, otter absence led 

to a decline of slow-growing 
calcareous algae through sea 
urchin herbivory, and this pat- 
tern was amplified by warming 
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temperatures. Keystone preda- 
tors are thus essential not only 
for trophic structure but also 
for mitigating the impacts of 
climate change. —SNV 

Science, this issue p. 1351 


PARKINSON’S DISEASE 


PARK7 preservation 
Mutations in the gene PARK7 
lead to the development of early- 
onset Parkinson's disease (PD), 
a neurodegenerative condition 
for which there are currently no 
effective treatments. Boussaad 
et al. identified an exonic splic- 
ing mutation in PARK7 linked 
to PD and studied the effect of 
this mutation in patient-derived 
cellular models. The mutation 
resulted in impaired splicing, 
reduced production of the 
protein DJ-1, and consequent 
mitochondrial dysfunction. 
Rescuing the aberrant splic- 
ing with a chemical rectifier 
of aberrant splicing rescued 
neuronal loss in patient-derived 
brain organoids. These results 
suggest that precision medicine 
targeting specific molecular 
signatures could be an effective 
strategy for PD and possibly 
other neurodegenerative dis- 
eases. —MM 

Sci. Transl. Med. 12, eaau3960 

(2020). 


TROPICAL FOREST 
Degradation exceeds 
deforestation 


Forest degradation is a ubiqui- 
tous form of human disturbance 


of the forest landscape. Activities 
such as selective logging and 
extraction fall short of total 
deforestation but lead to loss of 
biomass and/or fragmentation. 
On the basis of remote sensing 
data at 30-meter spatial resolu- 
tion, Matricardi et al. analyzed 
the extent of forest degrada- 
tion across the entire Brazilian 
Amazon over a ~22-year period 
up to 2014. They found that the 
extent and rate of forest degra- 
dation was equal to or greater 
than deforestation, which has 
important implications for 
carbon, biodiversity, and energy 
balance. -AMS 

Science, this issue p. 1378 


METABOLISM 
Finding calorie restriction 
mimetics 


Calorie restriction extends the 
health span, and this may be 
partially mediated by a drop in 
core body temperature. Guijas 
et al. compared metabolomics 
data from calorie-restricted mice 
housed either at thermoneu- 
trality or a cooler temperature. 
Calorie restriction induced the 
hypothalamus to produce the 
gasotransmitter nitric oxide 
and the opioid peptide leucine 
enkephalin only in mice housed 
at the cooler temperature. 
These and other metabolites 
differentially altered by ambient 
temperature may form the basis 
for treatments that can deliver 
the beneficial effects of calorie 
restriction. -WW 

Sci. Signal. 13, eabb2490 (2020). 


Roadways in the Brazilian Amazon contribute to damaging forest degradation, 
even in the absence of outright deforestation. 
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CLIMATE WARMING 
Rapid response 


s the climate warms, Arctic 


temperatures are rising faster 


than temperatures at lower 

latitudes, a phenomenon 

called Arctic amplification. 
Loss of sea ice and snow cover at 
high northern latitudes have long 
been understood to contribute to 


this behavior, but other mechanisms 
have been suggested as well. Previdi 


et al. analyzed climate model simula- 

tions and conclude that this amplified 
warming response actually begins before sea 
ice loss becomes important and that fast 
atmospheric processes are instead responsible 
for its initiation. Therefore, the loss of sea ice is 
an amplifier of enhanced Arctic warming rather 


than a trigger. —HJS 


Geophys. Res. Lett. 10.1029/2020GL089933 (2020). 


NEUROSCIENCE 
Representation of what 
happened when 


Episodic memory depends on 
the hippocampus and entorhinal 
cortex. Although the temporal 
coding properties of hippocam- 
pal neurons are well known, the 
temporal code in the entorhinal 
cortex, which provides impor- 
tant input to the hippocampus, 

is less understood. Bright et al. 
examined monkey entorhinal 
neuron responses in a 5-second 
period after presentation of an 
image. Entorhinal neurons were 
activated shortly after a visual 
stimulus and then decayed with 
a variety of rates, enabling recon- 
struction of when the image was 
presented. To determine whether 
the pattern of neuronal activation 
depended on the identity of the 
image presented, each image was 
shown twice during the experi- 
ment. These results suggest that 
entorhinal cortex context cells 
carry information about what 
happened in addition to when it 
happened. —PRS 

Proc. Natl. Acad. Sci. U.S.A.117, 20274 

(2020). 


Published by AAAS 


HUMAN GENETICS 
Extending genetic 


predictions 


Polygenic risk scores (PRSs) 
aggregate genomic informa- 
tion to predict an individual's 
risk of developing diseases 
with a genetic basis. To deter- 
mine links between PRSs and 
health. Wainberg et al. profiled 
the blood plasma of almost 
5000 individuals and exam- 
ined PRSs for 54 diseases. 
From this, they linked PRSs to 
766 detectable traits, including 
those that affect proteins or 
metabolites or are clinically rel- 
evant. Because many of these 
relationships were known, this 
work confirms links between 
genotype and phenotype and 
provides a platform for future 
work. Unexpectedly, some 
healthy individuals with a PRS 
indicating high risk for disease 
had a blood profile similar to 
those from individuals with 
disease. This indicates that 
genetic information can help to 
separate disease risk factors 
from the consequences of a 
pathological condition and 
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identify potential preventative 

interventions. —LMZ 

Proc. Natl. Acad. Sci, U.S.A. 117, 21813 
(2020). 


SIGNAL TRANSDUCTION 
Spermatogenesis 
gets a NOD 


NOD-like receptors function as 
pattern recognition receptors 
in the innate immune system, 
but some are only expressed 

in the mammalian germline. 
Yin et al. describe the func- 
tion of the NOD-like receptor 
NLRP14 in mice to promote 
differentiation of primor- 

dial germ cell-like cells and 
spermatogenesis. In this case, 
the receptor has a signaling 
role distinct from that of its 
immune counterparts. NLRP14 
formed a complex with the 
chaperone cofactor BAG2 

and with HSPA2, a member 

of the HSP70 70-kilodalton 
heat shock protein fam- 

ily. Such binding prevented 
ubiquitin-mediated protea- 
somal degradation of HSPA2, 
allowing it to translocate to the 
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nucleus, where it helps to pack- 
age spermatid DNA. —LBR 

Proc. Natl. Acad. Sci. U.S.A. 10.1073/ 

pnas.2005533117 (2020). 


ULTRAFAST IMAGING 
Diffractive imaging 

ina flash 

Ultrashort light pulses on the 
time scale of attoseconds 
provide a window into some of 


the fastest electronic effects 
occurring in solid-state systems. 


Diffraction pattern generated 
using simulated attosecond pulses 
for coherent object imaging 


Climate warming is occurring at a much 
higher rate in the Arctic than 
in any other large region 

of the world. 


Temperature Anomaly (°C) 


2 “l 0 1 2 


Obtaining structural informa- 
tion through coherent diffractive 
imaging is usually done with 
monochromatic x-ray sources. 
However, ultrashort pulses 

are inherently broadband, and 
getting transient structural 
information on such short time 
scales is challenging. Rana et al. 
describe a method that works 
with the broadband nature of 
ultrashort pulses. They split 

the pulses into 17 different 
wavelengths and then used an 
algorithm to computationally 
stitch together the diffraction 
patterns from each wavelength 
to reveal the structural image 
optimized across all wave- 
lengths. Demonstrating the 
technique at optical wavelengths 
illustrates the feasibility of apply- 
ing the method to ultrafast x-ray 
pulses. —ISO 


Phys. Rev. Lett. 125, 086101 (2020). 


ELECTROCHEMISTRY 
High-performance 
aqueous Al-ion batteries 


Because of its high abun- 
dance, low production cost, 
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and three-electron redox 
properties, aluminum (Al) 
has received considerable 
attention in recent years for 
the development of possible 
alternatives to conventional 
lithium-based batteries. Yan et 
al. propose an aqueous Al-ion 
battery configuration consist- 
ing of an Al,MnO, cathode, 
a zinc substrate—supported 
Zn-Al alloy anode, and an 
AI(OTF), aqueous electrolyte. 
This battery demonstrates 
promising values for key per- 
formance indicators such as 
cycling life, reversible capacity, 
discharge voltage plateau, and 
rate capability. The present 
work is an important step in 
designing Al-ion—based batter- 
ies for practical applications. 
—YS 

J.Am. Chem. Soc. 10.1021/ 

jacs.0c05054 (2020). 


CELL BIOLOGY 
Getting the size right 


Double-membraned autopha- 
gosomes enwrap defunct 
organelles or intracellular 
aggregates, allowing them to 

be delivered to lysosomes, 
where they are degraded. 
Autophagy also allows cells 

to survive short periods of 
starvation by recycling intracel- 
lular components for reuse 

in critical processes. How do 
cells manufacture autopha- 
gosomes of the right size to 
engulf targets? Yamamoto et al. 
identified a protein, ERdj8, that 
is localized to the endoplasmic 
reticulum (a key source of 
autophagic membranes) and 
acts as a size regulator of newly 
formed autophagosomes. When 
ERdj8 was inactivated through 
treatment with small interfer- 
ing RNA, cells produced small 
autophagosomes that could not 
engulf large autophagic targets 
such as damaged mitochondria. 
Increasing the amount of ERdj8 
delayed autophagosome forma- 
tion and allowed prolonged 
extension of the phagophore to 
yield large autophagosomes. 
Thus, ERdj8 allows targeting of 
diverse size objects for recy- 
cling. —SMH 

J. Cell Biol. 219, 201903127 (2020). 
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HUMAN GENOMICS 
Asurvey of transcription 
across tissues 


Some human genetic vari- 
ants affect the amount of RNA 
produced and the splicing of 
gene transcripts, crucial steps 
in development and maintaining 
a healthy individual. However, 
some of these changes only 
occur ina small number of 
tissues within the body. The 
Genotype-Tissue Expression 
(GTEx) project has been 
expanded over time, and, look- 
ing at the final data in version 8, 
Aguet et al. present 
a deep characterization of 
genetic associations and gene 
expression and splicing in 838 
individuals over 49 tissues (see 
the Perspective by Wilson). This 
large study was able to char- 
acterize the details underlying 
many aspects of gene expres- 
sion and provides a resource 
with which to better understand 
the fundamental molecular 
mechanisms of how genetic 
variants affect gene regulation 
and complex traits in humans. 
—LMZ 

Science, this issue p. 1318; 

see also p.1298 


HUMAN GENOMICS 
The role of sex in the 
human transcriptome 


In humans, the inheritance of 
the XX or XY set of sex chromo- 
somes is responsible for most 
individuals developing into 
adults expressing male or female 
sex-specific traits. However, 

the degree to which sex-biased 
gene expression occurs in tis- 
sues, especially those that do 
not contribute to characteristic 
sexually dimorphic traits. is 
unknown. Oliva et al. examined 
Genotype-Tissue Expression 
(GTEx) project data and found 
that 37% of genes in at least one 
of the 44 tissues studied exhibit 
a tissue-specific, sex-biased 
gene expression. They also 
identified a sex-specific variation 
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in cellular composition across 
tissues. Overall, the effects of sex 
on gene expression were small, 
but they were genome-wide 
and mostly mediated through 
transcription factor binding. 
With sex-biased gene expression 
associated with loci identified 
in genome-wide association 
studies, this study lays the 
groundwork for identifying the 
molecular basis of male- and 
female-based diseases. —LMZ 
Science, this issue p. 1331 


HUMAN GENOMICS 
Cell type-specific 
quantitative trait loci 


Understanding how human 
genetic variation affects pheno- 
type requires tissue- or even cell 
type-specific measurements. 
Kim-Hellmuth et a/. used com- 
putational methods to identify 
cell-type proportions within bulk 
tissues in the Genotype-Tissue 
Expression (GTEx) project 
dataset to identify cell-type 
interaction quantitative trait loci 
and map these to genetic vari- 
ants correlated with expression 
or splicing differences between 
individuals. By characterizing 
the cellular context, this study 
illustrates how genetic variants 
that operate in a cell type- 
specific manner affect gene 
regulation and can be linked to 
complex traits. This deconvolu- 
tion and analysis of cell types 
from bulk tissues allows greater 
precision in understanding how 
phenotypes are linked to genetic 
variation. —LMZ 

Science, this issue p. 1332 


HUMAN GENOMICS 
Telomere length within 
individuals 


Telomeres are DNA-protein 
complexes that protect chro- 
mosome ends. Their length is 

of great interest because short 
telomeres are associated with 
specific diseases and with aging. 
Demanelis et al. measured 
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telomere length from 952 
Genotype-Tissue Expression 
(GTEx) project donors across 
tissues, of which 24 tissue types 
have measurements for more 
than 25 samples. This dataset 
shows that telomere length is 
not constant but is correlated 
across tissues. Most tissue 
telomeres shorten with age, 
but some, such as those in the 
testis and cerebellum, do not. In 
African Americans, telomeres 
are longer on average than those 
from individuals of primarily 
European descent across many 
tissue types. This observation is 
consistent with variability being 
passed from germ cells to zygote 
to differentiated cells during 
development. —LMZ 

Science, this issue p. 1333 


HUMAN GENOMICS 
Functional rare variation 
in transcriptomes 


Every human genome contains 
tens of thousands of rare genetic 
variants—which include single 
nucleotide changes, insertions or 
deletions, and larger structural 
variants—and some may have 
a functional effect. Ferraro et 
al. examined data from indi- 
viduals in the Genotype-Tissue 
Expression (GTEx) project for 
outliers across tissues caused 
by gene expression, splicing, and 
allele-specific expression. Single 
rare variants were observed 
that affected the expression 
and allele-specific expression of 
multiple genes and, in the case 
of a gene fusion event, splicing. 
Experimental and computational 
validation suggest that many 
individuals carry more than 50 
rare variants that affect tran- 
scription in some way. Although 
most variants were predicted 
to not affect an individual's 
phenotype, a small percentage 
showed likely disease-related 
associations, emphasizing the 
importance of studying the 
impact of rare genetic variation 
on the transcriptome. —LMZ 
Science, this issue p. 1334 
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SEISMOLOGY 
The great seismic 
quiet period 
Noise from trains, airplanes, 
industrial processes, and other 
sources is recorded on seismom- 
eters worldwide. Disentangling 
this noise is important for 
extracting out natural signals, 
but the noise can also roughly 
track population movements. 
Lecocgq et al. compiled seismic 
observations around the world 
and found a substantial decrease 
in noise resulting from lockdown 
measures imposed in response 
to the coronavirus disease 2019 
pandemic (see the Perspective 
by Denolle and Nissen-Meyer). 
These observations tightly cor- 
respond to when the measures 
went into effect and offer a way 
to track aggregate behavior. 
This quiet period also offers the 
chance to extract anthropogenic 
sources of noise from those of 
natural processes. —BG 

Science, this issue p. 1338; 

see also p. 1299 


MICROBIOLOGY 
Microbial therapies 


The gut microbiota, diverse 
microorganisms that inhabit 

our intestines, have an increas- 
gly recognized number of roles 
in maintaining human health. 
These roles include maintaining 
igestive health and also more 
systemic roles such as brain 
health. In a Perspective, Wargo 
discusses the developments in 
modulating gut microbiota to 
reat patients with various dis- 
eases, including irritable bowel 
disease, metabolic syndrome, 
autism, and cancer. Treatment 
can be achieved by fecal microbi- 
ota transplantation from healthy 
donors or by using distinct 
bacterial communities that are 
associated with overall health. 
The advances and challenges of 
this exciting approach to health 
are discussed. —GKA 
Science, this issue p. 1302 
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FERROELECTRICS 
Switching to the 
atomic scale 


Ferroelectric materials are 
attractive because they provide 
a way to change electrical 
resistance by using an electric 
ield. Lee et al. used simulations 
o explain the persistence of 
erroelectric behavior in very 
hin films of hafnium oxide (see 
he Perspective by Noheda and 
Aiguez). The authors’ calcula- 
tions show that ferroelectric 
properties should be found in 
films below 1 nanometer thick. 
This makes the material very 
attractive for the next generation 
of random access memory. —BG 
Science, this issue p. 1343; 
see also p.1300 
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ULTRACOLD PHYSICS 
Laser cooling of 
symmetric top molecule 


Experimental progress over the 
past few decades has led to the 
mastery of ultracold atomic 
gases. A major thrust of cur- 
rent research is to extend this 
success to ultracold molecules, 
which would open qualitatively 
new perspectives for quantum 
information science, preci- 
sion measurement, quantum 
chemistry, and other fields. The 
internal degrees of freedom in 
molecules preclude immediate 
implementation of conventional 
methods. Using a specific 
combination of rovibronic optical 
transitions, Mitra et al. report 
direct Sisyphus laser cooling 

of the symmetric top molecule 
CaOCH, to temperatures below 
1 millikelvin (See the Perpsective 
by Hudson). The proposed 
scheme for cooling is potentially 
applicable to a wide range of 
nonlinear polyatomic molecules. 
—YS 


Science, this issue p. 1366; 
see also p. 1304 


COLLOIDS 


Self-limiting bonding 
Although many routes have 
been developed to link together 
colloidal particles into controlled 
superstructures from dimers all 


SCIENCE sciencemag.org 


the way up to three-dimensional 
lattices, they generally depend 
on coating the nanoparticle 
surfaces in specific ways to 
control the way they link up. 
By contrast, Yi et al. developed 
a ligand chemistry such that, 
when two particles link together, 
it changes the electrostatic 
properties to limit subsequent 
bonding (see the Perspective 
by Gang). Particles are coated 
with complementary polymer 
strands that undergo an acid- 
base neutralization reaction. 
This bonding is controlled by 
the length of the flexible ligands, 
whereas the arrangement of the 
bonded particles is controlled 
by electrostatic repulsions, thus 
giving two parameters to tune 
the shape of the assemblies that 
form. —MSL 

Science, this issue p. 1369; 

see also p. 1305 


ATMOSPHERIC AEROSOLS 
A multiphasic effect 


Aerosols exert a primary influ- 
ence on atmospheric chemistry. 
One of the main controls on 
their internal chemistry is their 
acidity, so understanding what 
determines aerosol pH is fun- 
damental for determining their 
environmental effects. Zheng 
et al. considered how buffering 
capacity in a multiphase aerosol 
system differs from bulk solution 
and found an important role for 
water content in determining pH 
in ammonia-buffered regions. 
Their conclusions underscore 
the important influence of 
ammonia emissions in the 
Anthropocene. —HJS 

Science, this issue p. 1374 


CORONAVIRUS 
A gateway to the cytosol 


Coronaviruses transform host 
cell membranes into peculiar 
double-membrane vesicles 
that have long been thought 
to accommodate viral genome 
replication. However, because 
these compartments appeared 
to be completely sealed, it has 
remained unknown how the 
newly made viral RNA could 
be exported to the cytosol for 


translation and packaging into 
new virions. Wolff et a/. used 
cryo—electron microscopy 
to identify a molecular pore 
that spans the double mem- 
brane (see the Perspective by 
Unchwaniwala and Ahlquist). 
Six copies of a large coronavirus 
transmembrane protein formed 
the core of this structure, which 
may constitute a viral RNA 
export channel and provide a 
target for future antiviral inter- 
ventions. —SMH 

Science, this issue p. 1395; 

see also p. 1306 


IMMUNE DEVELOPMENT 
To each their own 


The recombination activat- 
ing genes Rag] and Rag2 play 
central roles in assembling 
functional T and B cell recep- 
tors in developing lymphocytes. 
Expression of Ragl and Rag2 in 
hematopoiesis is restricted to 
these two lymphoid lineages, but 
precisely how this is accom- 
plished has remained a mystery. 
Miyazaki et al. identified three 
key enhancer elements that 
recruit the transcription factor 
E2A to promote the expres- 
sion of Ragl1 and Rag2 during 
lymphocyte development. By 
generating mouse strains 
lacking one or more of these 
enhancer elements, they report 
that T and B cells use distinct 
enhancer modules to activate 
and maintain expression of Rag 
and Rag2. —CNF 

Sci. Immunol. 5, eabb1455 (2020). 


HEALTH AND MEDICINE 
3D printed composites 
for cartilage repair 


Damage to cartilage of the joints 
is acommon debilitating injury. 
However, because of its limited 
capacity for self-repair, regenera- 
tion of damaged cartilage has 
thus far remained beyond reach. 
Sun et al. created a composite 
that recreates key elements of 
the native structure of articular 
cartilage by three-dimensionally 
printing structural elements and 
cells together in a gradient man- 
ner. When tested in rabbits, the 
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composite showed high levels 
of cartilage maturation as well 
as evidence of lubrication at the 
joint surface, which is essential 
to maintaining functionality of 
the joint. Further assessment of 
these materials in larger animal 
models is needed, but such gra- 
dient composites may provide 
a basis for future tissue-engi- 
neered cartilage replacements. 
—JST 

Sci. Adv. 10.1126/sciadv.aay1422 

(2020). 
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Global quieting of high-frequency seismic noise due 
to COVID-19 pandemic lockdown measures 
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Human activity causes vibrations that propagate into the ground as high-frequency seismic waves. Measures to 
mitigate the coronavirus disease 2019 (COVID-19) pandemic caused widespread changes in human activity, 
leading to a months-long reduction in seismic noise of up to 50%. The 2020 seismic noise quiet period is the 
longest and most prominent global anthropogenic seismic noise reduction on record. Although the reduction is 
strongest at surface seismometers in populated areas, this seismic quiescence extends for many kilometers 
radially and hundreds of meters in depth. This quiet period provides an opportunity to detect subtle signals 
from subsurface seismic sources that would have been concealed in noisier times and to benchmark sources of 
anthropogenic noise. A strong correlation between seismic noise and independent measurements of human 
mobility suggests that seismology provides an absolute, real-time estimate of human activities. 


eismometers record signals from more 
than just earthquakes: Interactions be- 
tween the solid Earth and fluid bodies, 
such as ocean swell and atmospheric 
pressure (J, 2), are now commonly used 


to image and monitor the subsurface (3). 
Human activity is a third source of seismic 
signal. Nuclear explosions and fluid injection 
or extraction result in impulsive signals, but 
everyday human activity is recorded as a near- 


continuous signal, especially on seismometers 
in urban environments. These complicated 
signals are the superposition of a wide variety 
of activities happening at different times 
and places at or near Earth’s surface but are 
typically stronger during the day than at night, 
weaker on weekends than weekdays, and 
stronger near population centers than sparsely 
inhabited areas (4-7). Seismometers in urban 
environments are important to maximize the 
spatial coverage of seismic networks and to 
warn of local geologic hazards (8), even though 
anthropogenic seismic noise degrades their 
capability to detect transient signals associ- 
ated with earthquakes and volcanic erup- 
tions. Therefore, it is vital to understand urban 
seismic sources, but studies have been limited 
to confined areas or distinct events, such as 
road traffic (9, 10), public transport (7, 1), 
and “football quakes” (77, 12). Broad analysis 
of the long-term global anthropogenic seismic 
wavefield has been lacking. The impact of 
large, coherent changes in human behavior 
on seismic noise is unknown, as is how far it 
propagates and whether seismic recordings 
offer a coarse proxy for monitoring human 
activity patterns. Answering these questions 
has proven challenging because datasets are 
large, monitoring networks are heterogeneous, 
and the many possible noise sources likely 
vary spatially and overlap in time (J3). 

The coronavirus disease 2019 (COVID-19) 
outbreak was declared a global health emer- 
gency in January 2020 (J4) and a pandemic in 
March 2020 by the World Health Organization. 
The outbreak resulted in emergency measures 
to reduce the basic reproduction rate of the 
virus (15), beginning in China and Italy and 
then followed by most countries. These mea- 
sures disrupted social and economic behavior 
(16), industrial production (77), and tourism (/8). 
In this paper, we use the term “lockdown” to 
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Fig. 1. Locations of analyzed seismic stations throughout the world. The map shows locations of the 268 global seismic stations with usable data (e.g., no long 


data gaps, working sensors) that we analyzed. Lockdown effects were observed (red) at 185 of 268 stations. Sy 
The labeled stations are discussed in detail in the text. 


(30) to emphasize stations located in remote areas. 


broadly encompass many types of emergency 
measures, such as full quarantine [e.g., in 
Wuhan, China (79-21)], enforced physical dis- 
tancing (e.g., in Italy and the United Kingdom), 
travel restrictions (22), widespread closure of 
services and industry, and any other emer- 
gency measures. These major changes to daily 
life provide an opportunity to study their envi- 
ronmental impacts, such as reductions in nitrous 
oxide emissions in the atmosphere (23). Record- 
ings of human-generated seismic vibrations that 
travel through the solid Earth provide insights 
into the dynamics of pandemic lockdowns. 
We assessed the effects of COVID-19 lock- 
downs on high-frequency (4 to 14 Hz) seismic 
ambient noise (hiFSAN) (24). We compiled a 
global seismic noise dataset using vertical- 
component seismic waveform data from 337 
broadband and individually operated citizen 
seismometer stations (24), such as Raspberry 
Shake instruments (RSs), with a self-noise 
well below the ground motion generated by 
anthropogenic noise (25) and flat responses 
in the target frequency band (Fig. 1). We ob- 
tained usable data (e.g., no large data gaps, 
working sensors) from 268 stations and de- 
tected pronounced reductions in hiFSAN during 
local lockdown measures at 185 stations (Fig. 2). 
Periods that are often seismically quiet include 
weekends, as well as the Christmas and New 
Year holidays for locations where they are 
celebrated. Notably, we found a near-global 
reduction in noise, commencing in China in 
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late January 2020 (26), followed by Italy (26, 27), 
the whole of Europe, and the rest of the world in 
March to April 2020. This period of reduced 
noise lasted longer and was often quieter than 
the Christmas-to-New Year period. 

In China (Fig. 3A), the COVID-19 outbreak 
and subsequent emergency measures occurred 
during the Chinese New Year (CNY). In Enshi, 
a city located in Hubei province where the 
outbreak began (28), hiFSAN in 2020 clearly 
diverged from the normal annual reduction 
during CNY. The hiFSAN level remained at a 
minimum, demarcated by the start and end 
of quarantine in Hubei, for several weeks after 
CNY. Although the quarantine measures in 
Beijing were less strict, local hiFSAN reductions 
were more pronounced and lasted longer than 
in recent years. As of the end date of our 
analysis, Beijing has still not reached the 
average hiFSAN level of previous years, which 
suggests that the impact of COVID-19 is con- 
tinuing to restrict anthropogenic noise there. 
We noticed a later hiFSAN lockdown reduc- 
tion in April 2020 in Heilongjiang (Fig. 3A), in 
northeast China, near the Russian border. 

Although we observed seismic effects of 
lockdown in areas with low population den- 
sity estimates (<1 person per km”; Fig. 1), the 
strongest hiFSAN reduction occurred in popu- 
lated environments. For a permanent seismic 
station in Sri Lanka, a 50% reduction in 
hiFSAN occurred after lockdown, which is the 
strongest we observed in the available data 
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mbol size is scaled by the inverse of population density 


from that station since at least July 2013 (fig. 
$2). In Central Park, New York, on Sunday 
nights, hiFSAN was 10% lower during the 
lockdown than before this period (fig. $3). 

Seismic networks in populated areas enable 
us to correlate hiFSAN with other human acti- 
vity measurements, such as audible recordings 
and flight data (24). At a surface station in 
Brussels, Belgium (Fig. 3B), we found a 33% 
reduction in hiFSAN after lockdown. We com- 
pared this noise level with data from a nearby 
microphone, located close to a major road, 
that mainly records audible traffic noise. We 
found a high correlation between prelockdown 
hiFSAN and audible noise, both showing 
characteristic diurnal and weekly changes. 
However, during lockdown, audible noise 
reductions were more pronounced, which sug- 
gests that seismometers are sensitive to a wide 
distribution of seismic sources, not just nearby 
traffic. Audible and hiFSAN levels then grad- 
ually increased after April 2020. Independent 
mobility data (24) provide insights into what 
caused these changes. Mobility correlates with 
hiFSAN at lockdown, with correlation coeffi- 
cients >0.8 (24), except for time spent at places 
of residence (Google’s “residential” category), 
which is expected given the increased number 
of people spending more time at home because 
of government restrictions. 

Citizen seismometers provide a different 
urban ground motion dataset, with denser 
coverage in some places. Large hiFSAN drops 
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occurred particularly at schools and univer- 
sities after lockdown-related closures [e.g., 
in Boston and Michigan (United States) and 
Cornwall (United Kingdom); fig. S4]. The 
hiFSAN level was even 20% lower than during 
school holidays, which indicates sensitivity to 
the environment outside of the school. 

The pandemic has also affected tourism— 
for example, during the holiday season in 
the Caribbean. In Barbados (Fig. 3C), hiFSAN 


JP_- Tokyo 
SC - Seychelles 


0.00 0.25 0.50 0.75 1.00 
Normalized Amplitude 


decreased by ~45% after lockdown on 
28 March 2020 through April 2020 and stayed 
~50% below levels observed in previous years 
for the same period. However, seismic noise 
levels began to decrease 1 to 2 weeks before a 
local curfew was implemented. Local flight 
data (24) indicate that travel to Barbados 
started decreasing after 21 March 2020, and 
the overall reduction in hiFSAN might have 
been partly due to tourists repatriating. We 
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also observed noise reductions due to decreased 
tourist activity at ski resorts in Europe (Zugspitze 
in Germany) and the United States (Mammoth 
Mountain in California) (fig. S5). 

Although we observed lockdown effects 
most prominently at surface stations, we also 
detected them underground. In New Zealand, 
seismometers installed in boreholes (to mini- 
mize the effects of anthropogenic noise) monitor 
potential hazards associated with the Auckland 
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Volcanic Field (6, 8, 29). Station HBAZ is 380 m 
below the city, whereas MBAZ is at 98 m of 
depth, 14 km from the city center on the un- 
inhabited Motutapu Island (Fig. 3D). The 
hiFSAN level at both stations varied between 
weekdays and weekends before the lockdown, 
which suggests that both are sensitive to an- 
thropogenic activity. Although the island station 
is quieter overall, the lockdown instigated a 
reduction in hiFSAN by a factor of 2 for both 
stations. We attribute the remaining hiFSAN 
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Fig. 3. Regional examples 
of the 2020 seismic noise 
quiet period. The examples 
show different features of the 
lockdown seismic signal 
changes in regional settings. 
We filtered the hiFSAN data 
between 4 and 14 Hz and 
present temporal changes as 
displacement (A), accelera- 
tion (D), or percentage 
change relative to the 
baseline before lockdown 
{(B), (C), and (E)], with 

the panels in (A) also 
relative to the baseline of 
corresponding time periods 
in previous years. Individual 
seismic stations are identi- 
fied by codes in “network. 
station” format (IC.ENH, BE. 
UCCS, etc.). The keys in (B) 
to (E) include correlation 
coefficients (r) with mobility 
data (24). (A) Lockdown 
effects at three stations in 
China compared with the 
Chinese New Year holiday in 
previous years. (B) Lock- 
down effects on hiFSAN 
compared with audible 
environmental noise and 
independent mobility data in 
Brussels, Belgium. (C) Lock- 
down effect in Barbados 
compared with noise levels of 
the past decade (gray 
shading) and correlation with 
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maxima on the island (mid-April 2020 and 
early May 2020) to strong winds and high 
waves. On 27 April 2020, New Zealand lifted 
restrictions, with hiFSAN increasing to the 
prelockdown levels. 

The reduction of hiFSAN was weaker in less 
populated areas such as Rundu, located along 
the Namibia-Angola border (Fig. 3E). After 
COVID-19 was confirmed in Namibia, an emer- 
gency was declared on 17 March 2020 to re- 
strict mobility, followed by full lockdown on 
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Data (Corr. Coef. w/ Seismic) 
—— AF.RUDU Seismic Data 
—— Transport (r = 0.89) 
—— Retail (r = 0.68) 

—— Workplaces (r= 0.82) 
— Residential (r = -0.68) 
—-—- Lockdowns 

==» Lockdowns lifted 


local flight data at the 
Grantley Adams International 
Airport (TBPB) (24). 

(D) Lockdown noise reduc- 
tion recorded on borehole 
seismometers in Auckland, 
New Zealand. (E) Lockdown 
noise reduction in a region 
of low population density in 
Rundu, Namibia. 


27 March 2020. These measures are reflected 
in the >25% hiFSAN reduction compared with 
prelockdown levels. Despite Rundu having a 
population roughly one-eighth and one-fifth 
as dense as those of Brussels and Auckland, 
respectively (30), we observed a similarly high 
correlation between seismic and mobility data. 
The Black Forest Observatory in Germany is 
an even more remote station, located 150 to 
170 m below the surface in crystalline bedrock. 
Although this station is considered a reference 
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Fig. 4. Global changes in seismic 
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changes are expressed relative to a 
prelockdown baseline. All categories 


show a strong positive correlation, apart from time spent in residential premises, which is anticorrelated. 


laboratory with low noise overall (32), we de- 
tected a small hiFSAN reduction during lock- 
down nights (fig. S6), corresponding to the 
lowest hiFSAN since at least 25 December 2015. 

Here we have provided a global-scale analy- 
sis of high-frequency anthropogenic seismic 
noise. Global median hiFSAN dropped by as 
much as 50% during March to May 2020 (Fig. 
4). The length and quiescence of this period 
represent the longest and most coherent global 
seismic noise reduction in recorded history, 
emphasizing how human activities affect the 
solid Earth. A globally high correlation exists 
between changes in hiFSAN and population 
mobility (24), with correlations exceeding 0.9 
for many categories. 

This distinct low-noise period will help op- 
timize seismic monitoring efforts (4). The ability 
to analyze the full spectrum of seismogenic 
behavior, including the smallest earthquakes, 
is essential for monitoring fault dynamics over 
seismic cycles, as well as for earthquake fore- 
casting and seismic hazard assessment. Small 
earthquakes should dominate datasets (32), 
but typical operational catalogs using amplitude- 
based detection do not include many of the 
smallest earthquakes (33). This detection issue 
is especially problematic in populated areas, 
where anthropogenic noise energy interferes 
with earthquake signals. This problem is exem- 
plified by recordings of a moment magnitude 
5.0 earthquake at 15 km of depth southwest of 
Petatlan, Mexico, during lockdown (fig. S7). An 
earthquake with this magnitude and source 
mechanism that occurs during the daytime 
would typically be observed at stations in urban 
environments only if the signal was filtered. 
However, the reduction of seismic noise by 
~40% during lockdown made this event visible, 
without any filtering required, at a RS station 
in Querétaro city, 380 km away. Low noise 
levels during COVID-19 lockdowns could thus 
allow detection of signals from previously un- 
recognized sources in areas with incomplete 
seismic catalogs. Such newly identified signals 
could be used as distinct templates (32) for 
finding similar waveforms in noisier data 
before and after lockdown. This approach also 
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works for tremor signals that are masked by 
anthropogenic noise yet vital for monitoring 
potential volcanic unrest (6). Although broad- 
band sensors in rural environments are less 
affected by anthropogenic noise, any densi- 
fication of and reliance on low-cost sensors in 
urban areas, such as RSs and low-cost accel- 
erometers (34), will require a better under- 
standing of anthropogenic noise sources to 
suppress false detections. As populations in- 
crease globally, more people become exposed 
to potential natural and induced geohazards 
(35). Urbanization will increase anthropogenic 
noise in exposed areas, further complicating 
seismic monitoring efforts. The ability to char- 
acterize and minimize anthropogenic noise is 
becoming increasingly important for accurate 
detection and imaging of seismic signatures of 
potentially harmful subsurface hazards. 
Anthropogenic seismic noise is thought to 
be dominated by noise sources <1 km away 
from detectors (5-7, 11, 36). Because population 
mobility generates time-varying loads that ra- 
diate energy through the shallow subsurface 
as Rayleigh waves (11), local effects such as 
construction sites and heavy machinery can 
affect individual stations. However, the 2020 
seismic noise quiet period reveals that when 
considering multiple stations or whole net- 
works over longer time scales, the anthropo- 
genic seismic wavefield affects large areas. 
With denser networks and more citizen sensors 
in urban environments, additional features of 
the seismic noise, rather than just amplitude, 
will become usable and will help identify dif- 
ferent anthropogenic noise sources (10, 37). 
Characterizing these sources will be useful 
for imaging the shallow subsurface in three 
dimensions in urban areas by using high- 
frequency anthropogenic ambient noise (38, 39). 
Our finding of a distributed noise field is 
supported by strong correlations with inde- 
pendent mobility data (Fig. 4). In contrast to 
mobility data, publicly available data from 
existing seismometer networks provide an ob- 
jective absolute baseline of human activity 
levels. Therefore, hiFSAN can serve as a near- 
real-time technique for monitoring anthropo- 
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genic activity patterns, with fewer potential 
privacy concerns than those raised by mobility 
data collection. In addition, although indus- 
trial activities may not be captured in mobility 
data, they may produce a seismic noise sig- 
nature. The 2020 seismic quiet period is a base- 
line for using seismic properties (36) to identify 
and isolate the sources contributing to the 
anthropogenic noise wavefield, especially when 
combined with data indicative of human beha- 
vior. Seismic observations of human activity 
during COVID-19 lockdowns have enabled us 
to assess the impact of mitigation policies— 
particularly the time to establish and recover 
from lockdowns—on daily life. As such, hiFSAN 
may provide important constraints for future 
health and behavioral science studies. 
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FERROELECTRICS 


Scale-free ferroelectricity induced by flat phonon 


bands in Hf0. 


Hyun-Jae Lee?, Minseong Lee’, Kyoungjun Lee’, Jinhyeong Jo’, Hyemi Yang’, Yungyeom Kim’, 
Seung Chul Chae, Umesh Waghmare®, Jun Hee Lee** 


Discovery of robust yet reversibly switchable electric dipoles at reduced dimensions is critical to the 
advancement of nanoelectronics devices. Energy bands flat in momentum space generate robust 
localized states that are activated independently of each other. We determined that flat bands exist and 
induce robust yet independently switchable dipoles that exhibit a distinct ferroelectricity in hafnium 
dioxide (HfO2). Flat polar phonon bands in HfO2 cause extreme localization of electric dipoles within its 
irreducible half-unit cell widths (~3 angstroms). Contrary to conventional ferroelectrics with spread 
dipoles, those intrinsically localized dipoles are stable against extrinsic effects such as domain walls, 
surface exposure, and even miniaturization down to the angstrom scale. Moreover, the subnanometer- 
scale dipoles are individually switchable without creating any domain-wall energy cost. This offers 
unexpected opportunities for ultimately dense unit cell—-by—unit cell ferroelectric switching devices that 


are directly integrable into silicon technology. 


erroelectricity arises from the spontane- 

ous ordering of electric dipoles in a crys- 

tal that is reversibly switched to opposite 

directions under an applied electric field. 

A ferroelectric oxide, hafnium dioxide 
(HfO.), recently emerged as an interesting 
material because of its robust electric dipoles 
at nanometer thicknesses and ability to di- 
rectly integrate into silicon devices (1-3). The 
switchability of electric dipoles in HfOs, a 
fluorite structure, is expected to be different 
from that in ABO; perovskite-structure oxides 
(4), as hinted by its large coercive field (5, 6) 
and slow domain propagation (7). Unfor- 
tunately, the underlying reasons for the sta- 
ble ferroelectricity and distinct switchability 
of HfO, at an atomic level are poorly under- 
stood. The relationship between structure and 
ferroelectric properties of HfO, is crucial for 
their use in advanced nanoelectronic devices 


Fig. 1. Structural origin of alternating 
ferroelectric and nonpolar layers in A 
orthorhombic HfO5. (A) Phonon 

dispersion of the cubic phase. The 

red dots labeled b, c, and d denote the 
primary instability of (B) X'2 mode, 

(C) Tj,, and (D) YZ, respectively, where 
arrows denote u, the displacements of 
oxygen atoms. (E) Polar Pj and antipolar 

YZ phonons condense in-phase with equal 
magnitude to generate an orthorhombic 
structure that consists of alternating B 
spacer layers and ferroelectric layers with 

up (top), and down (bottom) polarization, 
respectively. Silver spheres indicate 

Hf atoms; red and blue spheres indicate 
oxygen atoms in the ferroelectric layer 

with up and down polarization, respec- 

tively; and green spheres indicate oxygen 
atoms belonging to the spacer layer. 
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such as nonvolatile memories and low-power 
logic (2). 

We show that ferroelectric HfO2 possesses 
switchability that is robust even down to ir- 
reducible, subnanometer-scale dimensions. 
This behavior is due to the flat phonon bands 
intrinsic to the material. Whereas flat bands of 
electrons, photons, and magnons are known 
to cause exotic phenomena such as electron 
lattice (8), graphene superconductivity (9), 
and photon (10) and magnon localization (1), 
flat bands of polar phonons and their conse- 
quences in ferroelectrics are not well under- 
stood. The emergence of flat phonon bands in 
HfO, provides a missing link to extend those 
exotic phenomena to ferroelectrics. 

We used first-principles calculations to dis- 
cover flat bands of polar phonons and con- 
sequent localized dipoles, which induce a 
scale-free ferroelectric order in HfOg. This 


mixing 


Cubic 
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order contains a lateral array of vertically 
aligned polar layers separated by nonpolar 
spacer layers that are each of half-unit cell 
width (~2.5 A). The presence of the spacers 
in HfO, laterally localize the vertical dipoles 
within its half-unit widths in a steplike man- 
ner from one polar layer to the next. Contrary 
to conventional ferroelectrics, whose spread di- 
poles fade away below critical nano-dimensions 
(4, 12-15), the localized dipoles, which are stable 
and switchable down to the subnanometer 
scale, allow storage of bits in angstrom-size lat- 
eral domains without costing any domain-wall 
formation energy. Vanishingly zero interactions 
between the ferroelectric dipoles, evidenced 
by flat bands in HfOs, explain the unusual 
phenomena of its large coercive field (5, 6) 
compared with conventional ferroelectrics 
(16-21) and its extremely slow domain propa- 
gation (7). Because HfO, is already integrated 
into silicon technology, fabrication of ulti- 
mately dense memories could be accomplished 
by exploiting its irreducible unit cell-scale 
switchability. 

To determine the origin of unusual struc- 
tural features of the orthorhombic phase of 
HfO, (Pca2,), we analyzed the sequence of 
symmetry-lowering steps starting with the 
cubic Fm3m structure of HfO., which is known 
to be stable above 2870 K (22). Upon cooling, it 
transforms into tetragonal P4./nmc phase at 
2870 K, and then to monoclinic phase at T = 
2000 K. By contrast, the ferroelectric ortho- 
rhombic phase is stabilized in thin films at 
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room temperature (7). Phonon spectrum of the 
high-temperature cubic phase (Fig. 1A) reveals 
the dominant phonon instability with X5 sym- 
metry at w = -i 228 cm”, which involves anti- 
parallel x displacements of neighboring oxygen 
atoms in the yz plane (Fig. 1B). The cubic struc- 
ture transforms into a tetragonal structure 
through the condensation of an_X5 phonon 
with zero net polarization. Among the four 
phonons condensing in the transformation 
from tetragonal to orthorhombic (fig. S1), we 
focused on (i) the Tj, phonon with all oxygen 
atoms moving along the zg direction (Fig. 1C) 
generating a uniform polarization and (ii) the 
cell-doubling antipolar phonon Y?, where oxy- 
gen atoms in neighboring wz planes move along 
the g axis in an antiparallel manner (Fig. 1D), 
providing A-type ordering with zero net polar- 
ization. Similar to strained ZrO, (23), ferro- 
electricity of HfO, is improperly caused by 
the nonlinear interaction of stable I}, and 
X; phonons with the primary instability of 
the X5 phonon. 

An unusual aspect of the ferroelectric order 
originates from polar I}, and antipolar Y7 
phonons condensing with exactly identical 
amplitudes, which generate a dipolar parti- 
tioning into two types (24) of alternating 
atomically thin layers (Fig. 1E). The first type 
is the spacer with zero zg displacements of 
oxygen atoms, and the second is the ferro- 
electric layer with parallel z displacements 
of its oxygen atoms. Thus, spacers are dead 
layers that screen the elastic interaction be- 
tween the ferroelectric active layers. Experi- 
mental evidence for these layers can be seen 
in a study that used transmission electron 
microscopy and named them as minor and 
major layers, respectively (25). The structural 
characteristics of the spacer layers is discussed 
in fig. $2. 

Natural dipolar partitioning in orthorhombic 
HfO, has substantial consequences for its po- 
larization domain structure, contrary to that 
of perovskite ferroelectric PbTiO. As the local 
polarization vanishes in the spacer layer even 
in uniformly polarized HfO,, it inherently 
hosts a domain wall of vanishing thickness 
between oppositely polarized (180°) domains 
(Fig. 2A). Such a domain wall is essentially 
strain free (Fig. 2, €;, where 7 = 2, y, and 2; 
and fig. S3), with little change in the local 
structure and supporting unsuppressed bulk 
polarization in its neighborhood. This should 
lead us to expect a high energy cost of a sharp 
domain wall in HfO, because of theg|V x P|? 
term. But g ~ 0 is evident in its flat band of 
polar phonons (Fig. 2B and supplementary text 
SI) and makes a sharp domain wall feasible. 
This flatness of the polar bands results in a 
phonon velocity of nearly zero (Fig. 2D). The 
origin of the flatness is discussed with a spring 
model in fig. S4. The T point phonon in the flat 
band of the lowest frequency (Fig. 2B, black 
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Fig. 2. Flat bands and zero-width domain wall in HfO2, contrary to the diffused domain wall in PbTi03. 
(A and C) Atomic structure of the domain wall and variation in local polarization along the direction 
perpendicular to the domain walls in (A) HfO2 and (C) PbTiO3. Red and black lines correspond to the local 


indicate titanium and oxygen atoms, respectively. 


line) involves atomic displacements of all the 
modes condensed during cubic-to-orthorhombic 
transition, and that of higher frequency (Fig. 2B, 
red line) involves polar and antipolar modes. 
Physically, the elastic interaction between 
ferroelectric domains is screened by the spacer 
layer. With contribution mostly from dipole- 
dipole interactions, the domain wall energy 
of HfO, is weakly negative (~-18 mJ/m7) (sup- 
plementary text, section 2). By contrast, a 
domain wall separating the 180° polar do- 
mains in PbTiOs is diffuse, with a width of 
a few unit cells, and the polarization is sup- 
pressed in its neighborhood (Fig. 2C). This 
difference is because some of the polar atomic 
displacements are shared between adjacent 
domains in PbTiO3. The parameter g is sizable, 
as evident in its dispersed polar phonons (26) 
and a finite phonon velocity in PbTiO; (Fig. 2D). 

We sought to establish the stability and 
switchability of a polar domain that is half a 
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polarization averaged over their half- and single-unit cells, respectively. Although (V x P = Pe) is small 
and spreads over a few unit cells away from the domain wall in PbTiO, it is singularly large and highly 
ocalized at the domain wall in HfO2. (B) This is because of its low energy cost (g ~ 0) guaranteed by the 
flatness of polar phonon bands involving the polar and antipolar modes condensed in orthorhombic HfOs. 
Flat polar bands are indicated with black and red lines, and eigenmodes at I and Y in the bands are depicted 
in the insets. This flatness of the bands in HfOz is in sharp contrast to a dispersive band in PbTiO3 [(B), 
top, inset]. (D) Phonon velocities of the flat bands in HfO2 are nearly zero, whereas that of the polar band in 
PbTiO3 has a finite value. For PbTiO3, black spheres indicate Pb atoms, and light blue and red spheres 


unit cell wide, sandwiched between the spacers, 
by simulating reversal of its local polarization 
(Fig. 3A and fig. S6). The two-dimensional 
(2D) layer with flipped polarization has a 
robust stability, with a large energy barrier of 
1.34 eV that prevents it from switching back to 
the uniformly polarized state (Fig. 3B). The 
switching of polarization in the adjacent layer 
(Fig. 3B) results in a domain that consists of 
two ferroelectric layers sandwiching a spacer, 
following a path with a comparable energy 
barrier of 1.38 eV. Low dependence of domain 
wall energy on the width of polar domains 
reveals weak inter-domain wall interaction, as 
expected from the flat bands. By contrast, our 
simulation of a single-unit cell-wide domain 
in PbTiO; (Fig. 3C) has substantially reduced 
polarization at the diffused domain wall, and 
its marginal stability is evident in its tendency 
to expand spontaneously (with a small energy 
barrier of 0.024 eV) to domains of larger width 
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Fig. 3. Robust stability of a half—unit cell-wide ferroelectric domain. (A) Atornic 
structure of the thinnest domain and variation in polarization along the direction 
perpendicular to the domain walls in HfO>. (B) Energy along the path of polarization 
switching of HfOs, starting from the uniformly polarized structure [(B), left inset] 
to a state with reversed polarization in two layers [(B), right inset] passing through 
the state in (A). (C) In contrast to the domain polarization that is unsuppressed 
relative to bulk P in HfOz (A), it is substantially suppressed inside the switched 


(Fig. 3D). Despite the strongly knit 3D crystal 
structure of HfOs, our results establish that it 
consists of weakly interacting 2D polar layers, 
allowing stable and switchable ferroelectric 
domains at the ultimate limit of width (2.7 A). 

We provide a possible explanation of the 
puzzling observation that the coercive field 
of polarization switching observed in HfO, is 
unusually large and even comparable with the 
activation field (,). The coercive field (£,) in 
conventional ferroelectrics is typically 1/10 
of the activation field (Fig. 4A) (16) because 
polarization switching occurs through nucle- 
ation and the growth of ferroelectric domains 
of reversed polarization. Because of such col- 
lective behavior, E, is generally reduced by a 
factor proportional to the width of the domain 
wall (27). In HfO., domain walls are vanish- 
ingly thin, and the resulting reduction in E, 
(Fig. 4A) is marginal. With weakly interacting 
domain walls and zero group velocity of the 
relevant polar modes (g ~ 0, by the flat bands 
of HfO,), domain walls do not propagate effi- 
ciently and can move only by hopping over a 
large energy barrier (Fig. 3B), suggesting that 
their sluggish motion observed experimentally 
could be an intrinsic property (7, 28). By con- 
trast, a domain wall in PbTiO; encounters 
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a much smaller energy barrier of 0.024 eV 
(Fig. 3D), and its motion leads to rapid ex- 
pansion of its polar domain. 

We demonstrated the scale-free nature of 
polarization switching in HfO, by comparing 
reversal of uniform and local polarization (Fig. 
4B). Energetics of local and uniform polariza- 
tion switching in HfO, are strikingly similar. 
Flipping the polarization of a single layer is 
nearly energetically equivalent (per layer) to 
flipping the polarization of all layers. By con- 
trast, reversal of local polarization in a single 
unit cell-wide region is energetically forbidden 
in PbTiOs. Total energy along the polarization 
reversal in consecutive 2D polar domains in 
HfO, is a periodic function of the number of 
unit cell-width domains switched (Fig. 4C). 
The equal multistability and identical switch- 
ing barriers show absolutely scale-free behav- 
ior that can be labeled by the integer (Number 
of unit cells) and be a basis for a multilevel 
device whose number of states are similar to 
the number of lateral unit cells. In contrast to 
this multistate polar nature in HfOs, only bi- 
stability in PbTiO; is evident from its uniformly 
polarized states (+Py), which are more stable 
than the rest of the intermediate polar states 
(Fig. 4C, bottom). 
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domain as well as a few unit cells away from the domain wall of PbTiOs. 
robust stability of a half-unit cell-wide domain of HfO2 (B) is in complete 
contrast with the marginal stability of the single-unit cell domain in PbTiOs, 
which spontaneously expands to wider domains. Successive switching in the next 
layer [(B), red line] of HfO2 has an energy barrier comparable with that of the 
first switching process [(B), blue line] in HfO2 in contrast to a small energy 
barrier of successive switching in PbTiO3 (D). 


Switching coordinate 


D) The 


To establish the intrinsic size limit on ferro- 
electricity in HfO,, we simulated Hf-terminated 
slabs (Fig. 4D) perpendicular to (i) the (010) 
axis with in-plane polarization and (ii) the (001) 
axis with out-of-plane polarization. The polar- 
ization in a ferroelectric layer of (010) slabs sur- 
vives in a scale-free manner down to 1.5-unit 
cell thickness, with spacers acting as natural 
protective coatings. The polarization retains 
its bulk value in (001) slabs down to single- 
unit cell thickness, as expected from the im- 
proper nature of fluorite ferroelectricity (29). 
Thus, the intrinsic lateral and perpendicular 
size limits on ferroelectric order in HfO, films 
are 0.75 and 0.51 nm, respectively (fig. S5). Ro- 
bust ferroelectric order appears to exist down 
to 1-nm thickness, as recently reported (30), 
which verifies one of our predictions. Now, the 
storage size limitation is only from the elec- 
trode and the transistor used to interface with 
it for readout operations. 

Because HfO, is already compatible in sili- 
con electronics, our discovery of independently 
switchable polar layers could provide oppor- 
tunities to realize ultradense and low-cost 
ferroelectric random-access memory (FeRAM) 
or a ferroelectric field-effect transistor (FeFET) 
for memory or logic device applications (figs. 
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Fig. 4. Experimental activation and coercive fields revealing unusual 
switching behavior of HfO2 and simulated energetics demonstrating its 
fully scale-free 2D domain switching of 0.27-nm width. (A) Experimental 
values of activation (E,) and coercive (E,) fields for various ferroelectrics [BaTiO3, 
(16); Pb(Zr,Ti)O3, (17, 18); BiFeOs, (19, 21); Pb(MgNb)O3, (20); PbTiO3, (21); 

and (Hf,Zr)Oz, (5, 6)]. Although E, is 10 times smaller than E, in conventional 


ferroelectrics, these values are comparable in HZO, imp 
individual domain switching occur at the same field. (B) 
along paths of switching uniform and local (one unit cel 


S11 and $12). In addition, possibility of unit cell- 
by-unit cell dipolar control provides differ- 
ent opportunities for deterministic multilevel 
switching (figs. S6, S7, and S12), ultimately 
down to the angstrom scale. 
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An excess of small-scale gravitational lenses 


observed in galaxy clusters 


Massimo Meneghetti*?**, Guido Davoli**, Pietro Bergamini’, Piero Rosati“, Priyamvada Natarajan®, 


Carlo Giocoli">”, Gabriel B. Caminha®, R. Benton Metcalf’, Elena Rasia®’°, Stefano Borgani 


:9,10,11,12 
, 


Francesco Calura’, Claudio Grillo’*"*, Amata Mercurio”, Eros Vanzella 


Cold dark matter (CDM) constitutes most of the matter in the Universe. The interplay between 
dark and luminous matter in dense cosmic environments, such as galaxy clusters, is studied 
theoretically using cosmological simulations. Observations of gravitational lensing are used to 
characterize the properties of substructures—the small-scale distribution of dark matter—in 
clusters. We derive a metric, the probability of strong lensing events produced by dark-matter 
substructure, and compute it for 11 galaxy clusters. The observed cluster substructures are more 
efficient lenses than predicted by CDM simulations, by more than an order of magnitude. We suggest 
that systematic issues with simulations or incorrect assumptions about the properties of dark 


matter could explain our results. 


n the standard cosmological model, the mat- 
ter content of the Universe is dominated by 
cold dark matter (CDM), collisionless par- 
ticles that interact with ordinary matter 
(baryons) only through gravity. Gravita- 
tionally bound dark-matter halos form hier- 
archically, with the most massive systems 
forming through mergers of smaller ones. As 
structure assembles in this fashion, large dark- 
matter halos contain smaller-scale substruc- 
ture in the form of embedded subhalos. 

The most massive dark-matter halos at the 
present time are galaxy clusters, with masses of 
~10" to ~10" solar masses (Mo, one solar mass 
is ~2 x 10° kg). Galaxy clusters contain about 
a thousand member galaxies that are hosted 
in subhalos. The detailed spatial distribution of 
dark matter in galaxy clusters can be mapped 
by observing gravitational lensing of distant 
background galaxies. When distant background 
galaxies are in near perfect alignment with the 
massive foreground cluster, strong gravitational 
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lensing occurs. Strong lensing—nonlinear effects 
produced by the deflection of light—results in 
multiple distorted images of individual back- 
ground galaxies that can be detected in Hubble 
Space Telescope (HST) imaging. 

The probability and strength of these non- 
linear strong lensing effects can be predicted 
theoretically from simulations of structure 
formation (1). We test these predictions using 
observations of galaxy clusters, combining 
lensing data from the HST with spectroscopic 
data from the Very Large Telescope (VLT). Our 
observed sample of lensing clusters is split 
into three sets for this analysis: (i) a reference 
sample comprising three clusters with well- 
constrained mass distributions (mass models): 
MACS J1206.2-0847 (MACSJ1206) at redshift 
2 = 0.439, MACS J0416.1-2403 (MACSJ0416) 
at x= 0.397, and Abell S1063 (AS1063) at 
2% = 0.348 (2-6); (ii) a sample that includes the 
publicly available mass models for four Hub- 
ble Frontier Fields clusters [HFF, (7)], namely 
Abell 2744 at z = 0.308, Abell 370 at z = 0.375, 
MACS J1149.5+2223 (MACSJ1149) at x = 0.542, 
and MACS J0717.5+3745 (MACSJ0717) at z = 
0.545; and (iii) four clusters from the Cluster 
Lensing and Supernova Survey with Hubble 
[CLASH, (8)] project, with recent mass re- 
constructions [(9), their “Gold” sample]: RX 
J2129.7+0005 (RXJ2129) at x = 0.234, MACS 
J1931.8-2635 (MACSJ1931) atz = 0.352, MACS 
J0329.7-0211 (MACSJ0329) at z = 0.450, and 
MACS J2129.4-0741 (MACSJ2129) at z = 0.587. 
A color-composite image of MACSJ1206, one 
of the clusters in our reference sample (i), is 
shown in Fig. 1. Images of the other clusters 
are shown in figs. S1 to S3. 

Owing to their large masses, all these galaxy 
clusters act as strong lenses, producing multi- 
ple images of numerous background galaxies. 
To reconstruct their mass distributions, we 
combine the images with available spectro- 
scopic data (3, 10). For each cluster, the mem- 
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bership of hundreds of galaxies is confirmed 
spectroscopically, and their redshifts have been 
measured. The spectroscopy has also allowed 
identification of tens of multiply imaged back- 
ground sources per cluster. 

Mass models for the reference cluster sam- 
ple were constructed by using the publicly avail- 
able parametric lens inversion code LENSTOOL 
(11) and published previously (6). Clusters were 
modeled as a superposition of large-scale com- 
ponents to account for the large-scale cluster 
dark-matter halos, and small-scale components 
that describe the substructure. We associate 
the spatial positions of cluster member gal- 
axies with the locations of dark-matter sub- 
structure. The detailed mass distribution in 
these cluster galaxies is constrained using 
stellar kinematics measurements of cluster 
member galaxies from the VLT spectroscopy. 

The mass models for the clusters in the other 
two samples are built similarly (72); however, 
unlike the reference sample, the mass distribu- 
tion in the cluster member galaxies is not con- 
strained using data from stellar kinematics. 
For the HFF sample, a suite of lensing mass 
models constructed independently by several 
groups are publicly available from the Mikulski 
Archive for Space Telescopes (MAST); we used 
only those built using LENSTOOL for con- 
sistency [e.g., (13, 14)]. For the “Gold” sam- 
ple, we use published models (9) that were also 
built with LENSTOOL. 

The multiple images of distant sources lensed 
by foreground galaxy clusters have angular 
separations of several tens of arcseconds. The 
most distorted gravitational arcs occur near 
lines that enclose the inner regions of the clus- 
ter, referred to as critical lines, which delineate 
the region where strong lensing occurs. The 
size of the critical lines depends on the red- 
shifts of the background sources. Substructures 
within each cluster act as smaller-scale gra- 
vitational lenses embedded within the larger 
lens. If these substructures are massive enough 
and compact enough, they can also produce 
additional local strong lensing events on much 
smaller scales with separations of less than a 
few arcseconds. These small-scale features are 
expected to appear around the critical lines 
produced by individual cluster galaxies. We re- 
fer to these localized features as Galaxy-Galaxy 
Strong Lensing (GGSL) events. Sufficiently high- 
resolution mass reconstructions are necessary 
to recover these smaller-scale critical lines. For 
example, Fig. 1 shows the network of critical 
lines in MACSJ1206 for two possible source 
redshifts, zg = 1 and gz = 7. The cluster produces 
a large-scale critical line extending to 15 to 
30 arc sec and many smaller-scale critical lines 
around individual substructures, as shown in 
the insets. The presence of secondary critical 
lines indicates that the substructures are cen- 
trally concentrated and massive enough to 
act as individual strong lenses. 
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Fig. 1. Color-composite image of the central region of the galaxy cluster MACSJ1206. (A to D) The 
image combines HST observations in the filters FIOSW, F11OW, F125W, F140W, F160W (red channel), F606W, 
F625W, F775W, F814W, F850LP (green channel), and F435W and F475W (blue channel). The dashed and 
solid lines in (A) show the critical lines of the cluster for source redshifts of 1 and 7, respectively. Panels 
(B), (C), and (D) zoom into three GGSL events enclosing sources at redshifts 1.425, 4.996, and 3.753, 
respectively. The white lines in those panels show the critical lines of the lenses for the corresponding source 
redshifts. In (B) and (D), the background lensed sources are bluer than the foreground lensing galaxies. 

In (C), the lensed source is not visible in the HST image but is detected in an observation with the Multi-Unit 
Spectroscopic Explorer (MUSE) spectrograph on the VLT (12). The source is detected at a wavelength of 
~7289 A, corresponding to the redshifted Lyman-o. spectral line of hydrogen, at locations indicated by the cyan 
contours. The white crosses indicate the positions of four multiple images of the source. Equivalent images 


for all the other clusters are shown in figs. S1 to S3. 


We identify three GGSL events in the core 
of the cluster MACSJ1206, shown in Fig. 1, B to 
D: a ring-shape image (an Einstein ring) orig- 
inating from a source at g = 1.42; a triply 
imaged galaxy at z = 3.75 (15); and an Einstein 
cross with four distinct images of a source at 
2 = 4.99. The consistency between the shapes 
of the GGSL events and the predicted critical 
lines from the lens modeling, also shown in Fig. 
1, B to D, validates our multiscale mass model. 

Just as the observed gravitational arcs are 
lensed images of distant galaxies, the crit- 
ical lines are the lensed counterparts of the 
caustic lines (7), shown in Fig. 2, B and D. The 
caustics enclose the regions in which sources 
have to be located to be strongly lensed by 
substructures. We quantify the probability of 
observing GGSL events using the fraction of 
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the area of the sky inside the caustics produced 
by substructures. Figure 3 shows how the 
GGSL probability varies as a function of the 
source redshift for all clusters in our three 
samples. For MACSJ1206 (upper limit of the 
reference sample), it is ~10~° at z > 2. This 
probability can in turn be converted into an 
expected number of GGSL events by assum- 
ing the properties of the background source 
population of galaxies that can be lensed. Using 
galaxies seen in the Hubble Ultra-Deep Field 
(HUDF) (6) as a representative template for 
the properties of the background lensed sources, 
we calculate that <3 GGSL events should oc- 
cur in MACSJ1206, in agreement with the ob- 
servations. Equivalent estimates for MACSJ0416 
and AS1063 predict ~1 and ~0.9 events, re- 
spectively. In these two cases, our calculations 
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underpredict the number of observed GGSL 
events, as three candidate events have been 
reported in each of the two clusters (/7, 18). 
This underestimate is likely because the HUDF 
may not be an appropriate template for back- 
ground sources in these two clusters (12). Never- 
theless, we find that GGSL events are detected 
in multiple clusters. Twenty-four GGSL candi- 
date events have been found in other CLASH 
clusters, including four events in MACSJ1149 
and one event in each of the clusters MACSJO717, 
RXJ2129, and MACSJ0329 (18). 

We next consider whether the observed num- 
ber of GGSL events are consistent with theoret- 
ical predictions within the standard cosmological 
model. We performed the same analysis and 
computed the GGSL probability for 25 simulated 
galaxy clusters, which have masses, redshifts, 
morphologies, and mass concentrations similar 
to those in our three observed samples (72). The 
cosmological hydrodynamical simulations from 
which these simulated clusters are drawn (19) 
incorporate gas cooling, star formation, and 
energy feedback from supernovae and accret- 
ing supermassive black holes (SMBHs). 

Figure 2 shows a comparison between the 
critical lines and the caustics of MACSJ1206 
(panels A and B) and those of a simulated clus- 
ter of similar mass and concentration (panels C 
and D). MACSJ1206 has many more secondary 
critical lines within the observed area. The frac- 
tional area of the source plane that is enclosed 
by substructure caustics is larger in observed 
clusters than predicted by the simulated sam- 
ple, as is the probability of GGSL events. Figure 3 
shows that the GGSL probability differs by 
more than an order of magnitude between the 
observations and simulations. 

We performed several tests to investigate 
potential sources of this discrepancy (12). The 
results remain unchanged even when energy 
feedback from active galactic nuclei powered 
by SMBH accretion—which alters the internal 
structure of halos—is disabled in the simula- 
tions. This feedback suppresses star forma- 
tion in substructures, altering the slope of their 
inner density profiles, making them less cen- 
trally concentrated and, hence, weaker grav- 
itational lenses. Even without feedback, we are 
unable to completely bridge the gap between 
simulations and observations. Simulations 
without feedback are also grossly discrepant 
from observations for other well-measured 
quantities, such as the total fraction of baryons 
in clusters converted into stars. The mass and 
spatial resolutions of our simulations are suf- 
ficiently high to resolve the typical substruc- 
tures included in the lensing mass models 
(12). We also exclude the possibility that the 
computed GGSL probability could be enhanced 
by unassociated halos along the line-of-sight 
(LOS) to these clusters. Including multiple lens 
planes in the models generated using cosmo- 
logical simulations, we find that the substructure 
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Fig. 2. Comparison between an observed and a simulated gravitational lens. 
(A) The projected mass map (called convergence) of MACSJ1206 (color bar), 
overlaid with the critical lines for sources at redshift z = 7 (solid white lines). The 
dashed polygon delimits the region of the HST image within which cluster galaxies 
were selected and included in the lens model. (B) The caustics corresponding to 
the principal (in gray) and to the secondary critical lines (in red) of MACSJ1206 (12). 
The dashed gray line shows the limits of the field of view in (A) mapped into the 


critical lines and caustics are negligibly affected 
by halos along the LOS (72). The observationally 
constrained lens models reproduce the shapes 
and sizes of the observed GGSL events, eg., 
the model-predicted image positions match the 
observations within ~0.5 arc sec. 

The discrepancy between observations and 
simulations may be due to issues with either 
the CDM paradigm or simulation methods. 
Gravitational lensing has previously been used 
to probe detailed properties of dark-matter 
halos associated with individual cluster gal- 
axies [e.g., (20, 27)]. Simulations show that the 
mass and radial distributions of subhalos are 
nearly universal (22). Varying results have been 
reported for the level of agreement between 
lens model predictions and simulations for 
other derived quantities; e.g., the mass distribu- 
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X 1 [arcsec] 


tion functions of substructure derived from 
lensing data agree with simulations, but their 
radial distributions are more centrally con- 
centrated in observations than in simulations 
(5, 14, 15). Strong lensing clusters also contain 
more high-circular velocity subhalos (i.e., sub- 
halos with maximum circular velocities Voir. > 
100 km s~!) compared with simulations (5, 15, 23). 
The maximum circular velocity is given by 


Viire = max (1) 
where G is the gravitational constant, M(r) is 
the galaxy mass profile, and 7 is the distance 
from the galaxy center. Figure 4 shows that, in 
our lens models, observed galaxies have larger 
circular velocities than their simulated ana- 
logs at a fixed mass. This implies that dark- 


GM(r) 
—— 
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source plane (12). The GGSL probability is calculated by dividing the area of the 
secondary caustics by that enclosed by the dashed gray line. (€) The projected mass 
map and the critical lines for sources at redshift z = 7 of a simulated cluster with a 
mass similar to that of MACSJ1206 (12). The dashed polygon is the same as in 

(A). (D) Caustics of the simulated cluster shown in (C). Although the main critical lines 
and caustics have similar extents, the secondary critical lines and caustics are larger 
and more numerous in the lens model of MACSJ1206 than in the simulation. 


matter subhalos associated with observed 
galaxies are more compact than theoretically 
expected. Observed substructures also appear 
to be in closer proximity to the larger-scale 
cluster critical lines. Explaining this difference 
requires the existence of a larger number of 
compact substructures in the inner regions 
of simulated clusters. Baryons and dark mat- 
ter are expected to couple in the dense inner 
regions of subhalos, leading to alterations in 
the small-scale density profile of dark matter, 
so it could be that current understanding of 
this interplay is incorrect. Alternatively, the 
difference could arise from incorrect assump- 
tions about the nature of dark matter. 
Previous discrepancies between the pre- 
dictions of the standard cosmological model 
and data on small scales have arisen from 
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observations of dwarf galaxies and of satel- 
lites of the Milky Way, known as the “miss- 
ing satellite” (24, 25), “cusp-core” (26), and 
“too-big-to-fail” problems (27, 28), discrep- 
ancies with planes of satellite galaxies (29). 
The discrepancy that we report is unrelated 
to those issues. Previous studies revealed that 
observed small satellite galaxies were fewer 
in number and were less compact than ex- 


pected from simulations; here, we find the 
opposite results for cluster substructures. The 
GGSL events that we observe show that sub- 
halos are more centrally concentrated than 
predicted by simulations; i.e., there is an ex- 
cess not a deficit. Hypotheses advocated to 
solve previous controversies on dwarf galaxy 
scales would only exacerbate the discrepancy 
in GGSL event numbers that we report. 
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Fig. 3. The GGSL probability as a function of source redshift. The mean GGSL probability for our 
reference sample is shown with a solid dark blue line. The light blue dot-dashed and violet dotted lines plot 
the computed GGSL probability for the HFF and CLASH Gold samples. The median GGSL probability 
measured from simulations is given by the orange dashed line (12). The colored bands show the 99.9% 
confidence intervals for each dataset. The discrepancy between observations and simulations is about an 


order of magnitude. 


Fig. 4. Circular velocities and 500 


positions of substructures 

in simulated and observed gal- 
axy clusters. (A) Substructure 
circular velocity as a function of 
substructure mass Mgyp. The 
circular velocity is a proxy for 
the concentration of the sub- 
structure mass. The solid black 
line shows the average relation 
for the reference sample (6). 
The colored circles show the 
simulations, color-coded by the 
substructure distance from 

the cluster center R in units of 
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the cluster virial radius Ryir. The 
orange dashed curve shows 
the best-fitting model relation 


1010 


for simulated substructures whose distance is less than 15% of the virial radius. 
This is roughly the region around the cluster center probed by strong lensing. 
The observed relation is always above that derived from the simulations, 
indicating that observed substructures are more compact than the simulated 
ones. (B) Mean cumulative distribution of the substructure distances from the 
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Our results therefore require alternative ex- 
planations. One possibility is numerical effects 
arising from the resolution limits of simula- 
tions (30). However, known numerical arte- 
facts are not effective enough at disrupting 
satellites. We investigated this issue (72) and 
found that it can change the predicted GGSL 
event rate by at most a factor of 2, which is 
insufficient to explain the nearly order-of- 
magnitude discrepancy that we find. Any nu- 
merical artefacts would also appear on galactic 
scales, where they would worsen the missing 
satellite problem. 
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Keystone predators govern the pathway and pace of 
climate impacts in a subarctic marine ecosystem 
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Predator loss and climate change are hallmarks of the Anthropocene yet their interactive effects are 
largely unknown. Here, we show that massive calcareous reefs, built slowly by the alga Clathromorphum 
nereostratum over centuries to millennia, are now declining because of the emerging interplay 
between these two processes. Such reefs, the structural base of Aleutian kelp forests, are rapidly 
eroding because of overgrazing by herbivores. Historical reconstructions and experiments reveal that 
overgrazing was initiated by the loss of sea otters, Enhydra lutris (which gave rise to herbivores 
capable of causing bioerosion), and then accelerated with ocean warming and acidification (which 
increased per capita lethal grazing by 34 to 60% compared with preindustrial times). Thus, keystone 
predators can mediate the ways in which climate effects emerge in nature and the pace with 


which they alter ecosystems. 


redator loss and climate change are 

defining features of the Anthropocene 

(1-6). However, these processes have 

mostly been explored independently 

even in well-studied ocean ecosystems 
where the impacts of predator loss and climate 
change are both pronounced (5, 7). Because the 
interplay between these processes and their 
combined impacts are largely unknown, our 
ability to predict the mode and pace of eco- 
system change in the Anthropocene is limited. 
Here, we address this limitation by reveal- 
ing how keystone predator loss and climate 
change are together reshaping kelp forests 
of the remote Aleutian archipelago (8, 9) 
(Fig. 1A). 

Aleutian kelp forests are built upon a vast 
framework of Clathromorphum nereostratum, 
a long-lived red alga that forms massive lime- 
stone structures covering 50 to 100% of the 
shallow seafloor (Fig. 1B). These living reefs, 
assembled slowly (~0.35 mm of vertical growth/ 
year) over centuries to millennia (JO), serve 
as a habitat to many other species (77). They 
dominate the seafloor when kelp forests pre- 
vail (72) and have persisted through recent 
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centuries when this ecosystem was deforested 
by herbivores, principally because the alga’s 
calcified morphology makes it especially re- 
sistant to grazing (13). Like tropical corals (74), 
however, this calcifying reef builder may be 
especially sensitive to climate-induced changes 
in seawater temperature and acidity (15), and 
the alga’s skeleton indeed appears to have 
weakened in recent decades (16). Moreover, 
sea otters (Enhydra lutris), which maintain 
Aleutian kelp forests through a trophic cas- 
cade (8), have rapidly disappeared from south- 
west Alaska over the past 30 years (table S1) 
perhaps because of increased predation by 
killer whales (9), which ostensibly shifted their 
diet in response to industrial whaling (17). 
With this collapse, the sea otter’s main prey, 
the herbivorous sea urchin Strongylocentrotus 
polyacanthus, proliferated and denuded the 
region of kelp (table S2). We thus hypothesized 
that C. nereostratum reefs may now be suscep- 
tible to rapid destruction through overgraz- 
ing, given that (i) sea urchins, the system’s 
only major herbivore, are now hyperabundant; 
(ii) the alga’s skeleton weakened rapidly in the 
early 2000s (6), which could have increased the 
intensity (depth/bite) with which sea urchins 
can graze (13) and thus the lethality of grazing 
in recent time; and (iii) warming is postulated 
to elevate herbivore grazing rates in the ocean 
(8). To evaluate this hypothesis, we surveyed 
multiple islands across >700 km of the archi- 
pelago (Fig. 1A), quantifying the impacts of sea 
urchin grazing on C. nereostratum from 2014 
to 2017; reconstructed the history of sea urchin 
grazing frequency on C. nereostratum (through 
grazing scars archived in the alga’s skeleton) 
and modeled the putative drivers of change 
through time; and used controlled experiments 
to isolate the manner and degree to which 
present-day seawater conditions have altered 
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the net impacts of sea urchin grazing relative 
to the preindustrial era. 

Lethal grazing of C. nereostratum [i.e., re- 
peated grazing of tissues to a depth of >0.25 mm, 
far below the regenerative cell layer (10, 13); 
hereafter referred to as “bioerosion”] was se- 
vere and widespread. At each study site in 
2014, 40 to 85% of every colony was bioeroded, 
establishing that much of each colony had lost 
its living tissue and in turn the capacity to gen- 
erate new growth (Fig. 1C). Sea urchin grazing 
scars were up to 2.5 mm deep (Fig. 1D), re- 
vealing that up to 7 years of prior algal growth 
(0) can be removed by a single sea urchin bite. 
Destructive overgrazing was further evidenced 
by the presence of 40- to 60-mm-deep excava- 
tion pits (Fig. 1E) and a relatively reduced algal 
abundance (Fig. 1F) at Attu and the Semichi 
Islands, suggesting that decades to centuries 
of algal growth had already been lost in certain 
places by 2014. After these observations, we 
discovered that coralline algal abundance 
(virtually all C. nereostratum) declined across 
the archipelago during the next 3 years (Fig. 
1F) (2014 versus 2017; 25 reefs among 7 = 6 
islands, n = 4 to 6 sites/island; paired ¢ test: 
t = 6.178, df = 24; P < 0.0001) such that among 
all 6 islands, reefs lost on average 24% (+ 4 SE; 
median: 17%) and up to 64% of their total cal- 
cified reef framework over the 3-year period. 
Although marine heat waves occurred across 
the North Pacific in 2014 and 2015 (19), they 
did not produce local temperatures that would 
trigger algal mortality (0). Overgrazing, but 
not algal bleaching, was seen during each of 
our annual surveys, indicating that the most 
parsimonious driver of the observed reef de- 
cline was intense bioerosion. 

When sea otters are present at ecologically 
effective densities (20) [six or more individuals 
per kilometer of coastline (27)], they greatly 
suppress the size and abundance of sea urchins 
in the ecosystem (8). The severe bioerosion of 
C. nereostratum that we are now observing 
(Fig. 1) is therefore at least partially caused 
by the functional extinction of this keystone 
predator (table S1) and the resultant prolifer- 
ation of large sea urchins (tables S2 and S3), 
the principal agents of bioerosion (fig. S1). 
However, unlike in past centuries, when sea 
otters went functionally extinct because of the 
maritime fur trade (22), their recent popula- 
tion collapse occurred in tandem with rapid 
ocean warming and acidification due to rising 
atmospheric Pco, (23). This region has also 
experienced several ocean heat waves during 
recent decades (24), including in 2014 and 2015 
(19). The lethal bioerosion that is currently un- 
folding (Fig. 1) could thus be a function of the 
interplay among trophic cascades, ocean warm- 
ing, and ocean acidification. We therefore sought 
to establish how the process of bioerosion has 
changed through time and to determine the 
contributions of each putative driver to that 
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Fig. 1. Erosion of long-lived coralline algal reefs across the Aleutian 
archipelago. (A) Over centuries to millennia, C. nereostratum formed massive 
reefs that structurally underpinned Aleutian kelp forests. (B) However, 
these reefs are now eroding because of overgrazing by sea urchins. (C) Area 
(in square centimeters) of each colony that was grazed to a depth below 
its regenerative layer (gray bar) versus the area that persisted as living tissue 
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bites on each colony. (E) Depth (in centimeters) of grazing “excavation pits” 
on the reef. Bars in (C) to (E) are global means + SE from each island in 
2014 (n = 10 surveys/site; n = 2 sites/island or group; n = 8 sites total). 
(F) Spatial coverage of the coralline algal framework (median and quartiles: 
whiskers indicate 95% confidence intervals) when assessed at n = 6 islands 
(n = 20 quadrats/site; n = 4 to 6 sites/island) in 2014 (dark gray bars), 
2015 (white bars), and 2017 (light gray bars). 


change. To do so, we reconstructed the annual 
frequency of bioerosion over a 40-year period 
(1965-2004) through sea urchin grazing scars 
archived in the skeletons of C. nereostratum at 
Attu, Alaid, Amchitka, and Ogliuga, locations 
that experienced differing levels of sea otter 
recovery after cessation of the fur trade but 
before the recent collapse (22). We then mod- 
eled the degree to which bioerosion rates were 
predicted through space and time by the coin- 
ciding abundance of sea urchins and by sum- 
mer sea surface temperatures (SSTs) [note: 
we focus here on SST because although both 
seawater temperature and pH are changing 
because of rising atmospheric Pcog, only local 
reconstructions of SST are available (25)]. 

As expected, bioerosion rates through space 
and time (Fig. 2, A and B, and table S4) were 
predicted by sea urchin biomass, and thus by 
sea otter density (27). At Amchitka and Ogliuga, 
islands where sea otters had recovered to near 
carrying capacity and kelp forests had corre- 
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spondingly returned by 1965, bioerosion was 
negligible from 1965 to 1995 (Fig. 2A) but then 
abruptly increased to high levels thereafter 
when sea otter populations synchronously 
collapsed across the region (22). At Attu, bio- 
erosion was frequent from 1965 to 1970, when 
sea otters were absent. Rates then declined 
over the next 20 years in concert with Attu 
being repopulated by sea otters (72) but re- 
turned again to high levels after the afore- 
mentioned collapse. Bioerosion at Alaid, an 
island functionally devoid of sea otters since 
at least 1912, was frequent throughout the 
40-year period. Further, after statistically con- 
trolling for the cascading influence of sea otters 
and sea urchins (Fig. 2B), we discovered that 
bioerosion covaried positively with SST from 
1965 to 2004 (Fig. 2C and table S4). Aleutian 
SSTs have increased on average 0.5°C since 
1965, and this region experienced several 
warming anomalies during the 20th cen- 
tury (24, 25). Thus, whereas bioerosion was 
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initiated by the loss of sea otters and the re- 
sulting trophic cascade, seawater warming ap- 
pears to have markedly accelerated this process 
in recent times. 

To further establish whether sea urchin graz- 
ing has become more lethal to C. nereostratum in 
recent years and to identify the respective roles 
of seawater temperature versus Pco, in this 
process, we cultured C. nereostratum and large 
S. polyacanthus for 3 months under an appro- 
priate suite of temperature and Pco, treatments 
and then measured the effect of each on the 
structural integrity of C. nereostratum and its 
susceptibility to sea urchin grazing. Certain 
treatment combinations mirrored preindustrial, 
present-day, and predicted near-future mean 
summer conditions specific to the Aleutians 
(table S5). Elevating seawater Pcoz reduced the 
skeletal density of C. nereostratum, particularly 
when temperature was also increased [Fig. 3A 
and table S6; linear mixed-effects (LME) in- 
teraction term: P = 0.007]. Notably, a similar 


2 of 4 


RESEARCH | REPORT 


inverse relationship between skeletal density 
and seawater temperature has been evident 
in wild C. nereostratum since 1914 (16). Elevat- 
ing temperature increased per capita rates of 
lethal grazing (LME: P < 0.001) irrespective 
of Pcog (P = 0.467) (Fig. 3B and table S7). Net 
rates of bioerosion under present-day sea- 
water conditions (470 atm Pcog; 6.5 to 8.5°C) 
were 34 to 60% higher than those seen under 
preindustrial conditions (340 uatm Pco,; 6.5°C), 
suggesting that per capita rates of sea urchin- 
induced bioerosion are much higher (and 
thus more lethal) today than they were in the 
18th century, a time when sea otters were 
hunted to near extinction but societies had yet 
to fully industrialize. Our experiment also sug- 
gests that marine heat waves in 2014 and 2015 
likely triggered particularly intense bioerosion 
during those years (i.e., at rates similar to or 
above those seen at 470 nwatm Pco9; 8.5°C, a 
60% increase from preindustrial). Finally, our 
experiment predicts that without rapid adapt- 
ive evolution in C. nereostratum, an unlikely 
event given that it is long-lived and rarely re- 
produces sexually (J0), per capita bioerosion 
will increase another 17 to 39% by the year 
2100 with the additional seawater warming 
(+2°C) and acidification (+360 to 400 uatm 
Pcog) projected for this region (26, 27). 

Our study reveals that long-lived C. nereostratum 
reefs, which underpin the diversity and stability 
of Aleutian kelp forests, are in rapid decline. 
This decline, initiated by a trophic cascade 
and accelerated by ocean warming and acid- 
ification, would have gone undiscovered had 
we focused solely on the direct effects of cli- 
mate change on C. nereostratum (15). Our study 
reveals that the pathways and pace with which 
climate change is affecting C. nereostratum 
have been, and will continue to be, contingent 
upon the outcomes of species interactions, 
a general dependency that heretofore has 
neither been widely recognized nor well doc- 
umented in nature. In the near term, the re- 
covery of Aleutian sea otter populations 
would effectively buffer this system against 
a climate-induced decline of its structural 
foundation. Without sea otter recovery, sub- 
tle temperature- and pH-induced changes in 
C. nereostratum and S. polyacanthus will 
continue to amplify “interaction strengths” 
within the cascade, likely causing C. nereostratum 
reefs to collapse sooner than expected from 
the direct effects of climate change alone. 
Studying climate change through an ecolog- 
ical lens is therefore necessary (28, 29) to 
properly identify its emergent effects and 
to predict its future impacts. 

Our study also highlights the power of 
trophic cascades in nature (7) and the poten- 
tial for large predators to ameliorate some of 
the effects of climate change in the near term. 
Keystone predators are generally thought to 
act as “biotic multipliers” of climate change 
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Fig. 2. Historical patterns and 
causes of bioerosion. (A) Annual 
frequency of sea urchin grazing 
scars on C. nereostratum (mean/ 
5-year period + SE) from 1965 to 
2004 at Attu, Alaid, Amchitka, 

and Ogliuga (n = 5/island). Rectangle 
depicts the onset of the recent sea 
otter decline hindcasted from surveys 
(22). Partial effects plots of (B) sea 
urchin biomass and (C) SST reveal 
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(30). Our study expands on this view, indicat- 
ing that in some cases, keystone predators 
may instead serve as “biotic attenuators” of 
change and that these predators will amplify 
or attenuate change only in the places on 
Earth where they remain at ecologically ef- 
fective densities (7-3). 
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Fig. 3. Effects of seawater temperature 
and Pco; on algal integrity and bioerosion 
rate. (A) Skeletal density (in milligrams 

of CaCO3 per cubic centimeter) of 

C. nereostratum when cultured for 

4 months under various temperatures 

and Pcoz levels, including pairs that 
represent preindustrial (P), modern (M), 
and predicted near-future (F) conditions 
specific to the Aleutian Islands (n = 
3/treatment). (B) Rate at which large 

S. polyacanthus consumed C. nereostratum 
(in milligrams of CaCO3 per day per 
square centimeter of alga) during a 20-day 
grazing assay plotted as a function 

of the treatments that both experienced 
for 3 months before and during the 

assay (n = 9 to 13/treatment). Bars in 

(A) and (B) are means + SE. 
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Structural basis of transcription-translation coupling 


and collision in bacteria 


Michael William Webster’?***, Maria Takacs'2***, Chengjin Zhu?2*, Vita Vidmar??>, 
Ayesha Eduljee***4, Mo'men Abdelkareem***, Albert Weixlbaumer??>4+ 


Prokaryotic messenger RNAs (mRNAs) are translated as they are transcribed. The lead ribosome 
potentially contacts RNA polymerase (RNAP) and forms a supramolecular complex known as the 
expressome. The basis of expressome assembly and its consequences for transcription and translation 
are poorly understood. Here, we present a series of structures representing uncoupled, coupled, and 
collided expressome states determined by cryo-electron microscopy. A bridge between the ribosome 
and RNAP can be formed by the transcription factor NusG, which stabilizes an otherwise-variable 
interaction interface. Shortening of the intervening mRNA causes a substantial rearrangement that 
aligns the ribosome entrance channel to the RNAP exit channel. In this collided complex, NusG linkage is 
no longer possible. These structures reveal mechanisms of coordination between transcription and 


translation and provide a framework for future study. 


ll organisms express genetic information 

in two steps. mRNAs are transcribed 

from DNA by RNA polymerase (RNAP) 

and then translated by ribosomes to 

proteins. In prokaryotes, translation begins 
as the mRNA is synthesized, and the lead 
ribosome on an mRNA is spatially close to 
RNAP (J, 2). Coordination of transcription 
with translation regulates gene expression 
and prevents premature transcription termi- 
nation (3, 4). The trailing ribosome inhibits 
RNAP backtracking, which contributes to the 
synchronization of transcription and translation 
rates in vivo and in vitro (5-7). 

Coordination may also involve physical con- 
tacts between RNAP and the ribosome. The 
conserved transcription factor NusG binds 
RNAP through its N-terminal domain (NusG- 
NTD) and binds ribosomal protein uS10 
through its C-terminal domain (NusG-CTD) 
both in vitro and in vivo (8, 9). Formation of a 
NusG-mediated bridge by simultaneous binding 
has not yet been observed, and the conse- 
quences of physical coupling are unknown. 
RNAP and the ribosome also interact directly 
(10-12), and this complex has recently been 
visualized in situ (13). A transcribing-translating 
expressome complex formed by the collision 
of ribosomes with stalled RNAP in an in vitro 
translation reaction was reconstructed at 7.6-A 
resolution (JO). This architecture would not 
permit a NusG-mediated bridge. 

We sought to structurally characterize me- 
chanisms of physical transcription-translation 
coupling and resolve the relationship between 
NusG and the collided expressome. Expres- 
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somes were assembled by the sequential ad- 
dition of purified Escherichia coli components 
(70S ribosomes, tRNAs, RNAP, and NusG) to a 
synthetic DNA-mRNA scaffold (fig. S1, A to C). 
An mRNA with 38 nucleotides separating the 
RNAP active site from the ribosomal P-site was 
chosen to imitate a state expected to precede 
collision (74). 

A reconstruction of the expressome was 
obtained at 3.0-A nominal resolution by cryo- 
electron microscopy (cryo-EM) (Fig. 1A; fig. 
S1, D and E; and table S1). RNAP and the ribo- 
some do not adopt a single relative orientation 
within the expressome, and focused refine- 
ment was required to attain a reconstruction 
of RNAP at 3.8-A nominal resolution (Fig. 1A 
and fig. S2; see materials and methods). Re- 
fined atomic models collectively present the 
key steps of prokaryotic gene expression in a 
single molecular assembly (Fig. 1B). 

Direct contacts between RNAP and the ri- 
bosome, if they occur, are not stable in this 
complex, and the mRNA is the only consistent 
connection. We characterized the dynamics 
of the complex by plotting the range of RNAP 
positions relative to the ribosome using the 
angular assignments of particles from focused 
reconstructions (Fig. 1C and fig. S3A). RNAP is 
loosely restrained to a plane perpendicular 
to an axis connecting the RNAP mRNA exit 
channel to the ribosomal mRNA entrance chan- 
nel (movie S1). Within this plane, RNAP rotates 
freely. Seven clusters represent a series of pre- 
ferred relative orientations (Fig. 1C and fig. S3A). 

RNAP and ribosome models were placed in 
reconstructions generated from particles in 
clusters 1 to 6, but a large fraction of cluster 7 
was predicted to be incompatible with longer 
upstream DNA (fig. S3, B to F, and table S2; 
see materials and methods). Expressome models 
represent characteristic relative orientations 
for each cluster, and they collectively suggest 
a continuous movement of RNAP relative to 
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the ribosome surface involving substantial 
changes in both rotation (~280°) and trans- 
lation (~50 A) (Fig. 1D and movie S1). The 
closest domain of RNAP to the ribosome is the 
zinc finger of the 6’ subunit (8’-ZF) in all models. 
In clusters 1 to 3, the B’-ZF sits within a funnel- 
shaped depression between the head, body, 
and shoulder domains of the 30S subunit, 
bounded by ribosomal proteins uS3, uS4, and 
uS5. We predict that RNAP transits from cluster 
1 through clusters 2 to 5 to reach positions 
exemplified by model 6, where the RNAP f/-ZF 
is between uS3 and uS10 on the 30S head 
domain. 

NusG-NTD is bound to RNAP in expres- 
some cluster 6 but not in clusters 1 and 2 (Fig. 
1E). We determined that a substantial fraction 
of the imaged particles lacked NusG because 
of dissociation during gradient purification 
(fig. S3G). Notably, the predicted position of 
the NusG-CTD bound to uS10 (8, 9) is closest 
to the NusG-NTD bound to RNAP in cluster 6. 

An improved reconstruction of the NusG- 
coupled expressome was obtained from a sam- 
ple prepared with increased NusG occupancy 
(Fig. 2A and fig. S4, A and B; see materials 
and methods). Conformational heterogeneity 
of the ribosome and RNAP was substantially 
reduced, but focused refinement was required 
to obtain well-resolved ribosome and RNAP 
reconstructions (3.4 and 7.6 A, respectively) 
(fig. S4, C to E, and table S1). Continuous den- 
sity in the unfocused reconstruction confirmed 
that NusG bridges RNAP and the ribosome 
(Fig. 2A). We constructed an atomic model of 
the NusG-coupled expressome by fitting and 
refining a ribosome model and docking a pub- 
lished RNAP-NusG-NTD model consistent with 
our map (15) into their consensus positions in 
the unfocused reconstruction (Fig. 2B). 

Additional density corresponding to the 
NusG-CTD bound to uS10 was identified on 
the ribosome, which otherwise closely resembled 
that of the uncoupled expressome. The NusG- 
CTD is a KOW (Kyrpides, Ouzounis, and Woese) 
domain that consists of a five-stranded B bar- 
rel. As in the isolated NusG-uS10 complex, as 
determined by nuclear magnetic resonance 
(NMR) (8), strand 84 of NusG aligns with strand 
64 of uS10, thereby forming an extended inter- 
molecular B sheet (Fig. 2C). However, NusG 
and uS10 are substantially closer in the ex- 
pressome than they are in the isolated com- 
plex because NusG loops L1 (F141 and F144) 
and L2 (1164, F165, and R167) insert into a 
hydrophobic pocket of uS10 that is enlarged 
by movement of helix a2 (Fig. 2D and fig. $5, A 
to D). F165 of NusG, in particular, is embedded 
within uS10. This accounts for its key role in 
binding uS10, which has been identified by 
mutational studies (9). The altered position of 
NusG not only increases the area contacting 
uS10 but avoids clashing with neighboring 
ribosomal protein uS3 (Fig. 2D). 
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Fig. 1. Structural models of the uncoupled 
expressome. (A) Representative cryo-EM two- 
dimensional class averages showing conformational 
variability (left), and cryo-EM maps of the ribosome 
and RNAP in the uncoupled expressome (right). 
RNAP is shown in position 2 [see (E)], with measured 
rotation and translations of RNAP indicated. tDNA, 
template DNA; ntDNA, nontemplate DNA. (B) Atomic 
model of the uncoupled expressome in ribbon 
representation (left), and the central steps in gene 
expression shown by segmented cryo-EM maps with 
superimposed atomic coordinates (right). (C) Plot of 
RNAP-7OS relative orientation with clusters indicating 
a series of orientations (1 to 6) distinguished by 
rotation of RNAP. Further characterization of expres- 
some particles resembling cluster 6 (Fig. 2) 

revealed that these are likely physically coupled 
through NusG. Cluster 7 primarily includes particles 
with orientations incompatible with longer upstream 
DNA, but it also includes states that have been 
characterized by Wang et al. (26). (D) Representative 
positions of the RNAP B'-ZF in each expressome 
model relative to the ribosome surface. (E) NusG is 
present in state 6 (dashed green circle) but not in 
state 2. The position of B'-ZF is shown (dashed 
purple circle). The focused cryo-EM maps shown are 
filtered to 20-A resolution with fitted coordinates. 


Fig. 2. Structural models of the NusG-coupled 
expressome. (A) Focused cryo-EM maps of the 
ribosome and RNAP in the NusG-coupled expressome. 
Inset shows continuous electron density between NusG- 
NTD and NusG-CTD domains in an unfocused map 
filtered to 8 A (slice view). (B) Ribbon representations of 
the NusG-coupled expressome model. (C) Interaction of 
NusG-CTD with ribosomal protein uS10. (D) Structural 
superposition with the isolated NusG-uS10 complex 
based on alignment to uS10 (gray; PDB code 2KVQ) (left) 
and hydrophobic pocket created by conformational 
change of uS1O (right). (E) mRNA connecting the 
ribosome mRNA entrance channel to the RNAP exit 
channel shown by a cryo-EM map filtered to 4 A and 
fitted model. (F) The range of RNAP positions relative 
to the ribosome surface determined by multi-body 
refinement. Cartoon of two principal components ac- 
counting for 44% of variance (left). Component 1 
involves rotation in a plane approximately parallel to 
the surface of the ribosome and is limited by clashes 
between the B'-ZF of RNAP and either uS10 or h33 
(dashed circles). Component 2 is an orthogonal rotation 
limited by extension of the flexible NusG linker (residues 
117 to 126) in one direction (red through purple to blue 
arrows) and by the clash between §'-ZF and uS3 in the 
other (dashed circle). Positions of RNAP B'-ZF and 
NusG residue Q117 indicate trajectories (red through 
purple to blue arrows). Single-letter abbreviations for 
the amino acid residues are as follows: A, Ala; C, Cys; 
D, Asp; E, Glu; F, Phe; G, Gly; H, His; |, lle; K, Lys; L, Leu; 
M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; 
V, Val; W, Trp; and Y, Tyr. 
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The NusG-CTD recruits Rho to terminate 
the synthesis of untranslated mRNAs (6). In 
the NusG-coupled expressome, NusG binds 
uS10 with the same interface it binds Rho, 
which suggests that the events are mutually 
exclusive (fig. S5E) (17). The structure of the 
expressome thereby explains how the trailing 
ribosome is sensed by NusG and how transcrip- 
tion termination is consequently reduced. 

The binding of the NusG-NTD to RNAP 
suppresses backtracking by stabilizing the 
upstream DNA duplex (75, 78). In the expres- 
some, space for the upstream DNA is further 
restricted by an extended channel formed by 
uS10 and NusG (fig. S5F). The interaction of 
the NusG-CTD with uS10 is predicted to reduce 
dissociation of NusG-NTD from RNAP through 
increased avidity (19). The RNAP-NusG com- 
plex within the coupled expressome is likely 
stabilized by the trailing ribosome, and tran- 
scription elongation is consequently favored. 

The mRNA exit channel of RNAP is separated 
from the entrance channel of the ribosome by 
~60 A. Continuous electron density on the 
solvent side of uS3 allowed the modeling of 
the intervening 12 mRNA nucleotides, which 
completed the mRNA path from synthesis to 
decoding (Fig. 2E and fig. S6, A to C). The in- 
terpretability of the electron density varies con- 
siderably, however, and this model is considered 
one of an ensemble of mRNA conformations. 

The RNAP mRNA exit channel is adjacent 
to uS3 residues R72, K79, and K80, and clear 
electron density for mRNA in this region sug- 
gests a relatively stable contact. The path con- 
tinues to four arginines immediately outside 
the ribosomal mRNA entrance channel (R126, 
R127, R131, and R132) (fig. S6A). R131 and R132 
have been previously identified as imparting 
ribosomal helicase activity (20). The mRNA 
path in this region is close to, but different 
from, that observed previously in structures of 
mRNA-bound ribosomes (27) (fig. S6, D to F). 

Binding of the nascent transcript by uS3 
likely modulates secondary structure forma- 
tion. Structured mRNAs can decrease transla- 
tion rates (22), stabilize transcriptional pauses 
[e.g., the E. coli his pause (23)], or induce 
transcription termination (24). Although the 
ribosome can unwind mRNA secondary struc- 
ture with basic residues in the mRNA en- 
trance channel (20), preventing mRNAs folding 
downstream likely aids translation efficiency. 
We propose that by positioning RNAP in line 
with an extended series of basic residues, NusG 
helps keep nascent mRNAs single stranded 
and thereby enhances the efficiency of both 
transcription and translation. 

No stable contacts are observed between the 
core subunits of RNAP and the ribosome in 
the NusG-coupled expressome. The relative 
position of RNAP and the ribosome varies 
between particles, albeit substantially less than 
the sample with partial NusG occupancy (fig. 
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S4, A and B). Analysis of movement by multi- 
body refinement (25) reveals that RNAP is 
constrained to avoid clashes between ’-ZF 
and the cavity formed by uS3, uS10, NusG, 
and helix 33 of 16S rRNA (h33) into which it is 
inserted (Fig. 2F and movie $2). RNAP is also 
flexibly tethered to the ribosome by NusG, 
with the length of the NusG linker (residues 
117 to 125) varying in the range of 14 to 30 A. 

To test whether lengthening the intervening 
mRNA alters the architecture of the expres- 
some, we imaged two samples with four addi- 
tional mRNA nucleotides (42 in total) separating 
the RNAP active site from the ribosomal P-site 
(fig. S7A and table S1). Saturation with NusG 
increased particle frequencies resembling the 
NusG-coupled expressome with shorter mRNA, 
including density linking the complexes (fig. S7, 
B to E). Compared with shorter mRNA, more 
particles are observed arranged similarly to 
cluster 7 of the uncoupled expressome (Fig. 
1C). This arrangement is termed transcription- 
translation complex C (TTC-C) by Wang et ai. 
(26). However, NusG-CTD is bound to uS10 
only in cluster 6 but not cluster 7, which in- 
dicates that NusG couples in only one arrange- 
ment (fig. S7F). 

The mRNA spanning the mRNA exit and 
entrance channels is in an extended confor- 
mation in the NusG-coupled expressome. To 
test whether coupling by NusG is possible 
when the spanning mRNA is shorter, we ob- 
tained a reconstruction of a NusG-containing 


expressome with an mRNA shortened to 34 
nucleotides between the ribosomal P-site and 
the RNAP active site (Fig. 3A, fig. S8, and table 
S1). A model was constructed as described for 
the coupled expressome (Fig. 3B). 

In this model, RNAP is positioned close to 
the ribosome mRNA entrance channel—more 
than 50 A from its location in the NusG-coupled 
expressome. Consistent with this change, RNAP 
still binds the NusG-NTD but is no longer 
tethered through the NusG-CTD to uS10 be- 
cause the NusG linker (residues 117 to 125; 
maximum extension of ~30 A) would need to 
span an 85- to 145-A distance. We determined 
the structure of an equivalent sample lacking 
NusG and confirmed that the position of RNAP 
is very similar in this case (fig. SOA and table 
S}). Therefore, the architecture is not NusG- 
dependent and is similar to particles from clus- 
ters 1 and 2 of the uncoupled expressome (fig. 
$12). We conclude that RNAP coupling to the 
ribosome through NusG requires the P-site to 
be >34 nucleotides from the 3’ end of the mRNA. 

The rearrangement of RNAP and the ribo- 
some in our structure with shortened mRNA 
resembles the expressome formed by the col- 
lision of translating ribosomes with stalled 
RNAP [RNAP backbone root mean square 
deviation (RMSD) ~3 A based on 16S rRNA 
superposition] (70) (fig. S10, A and B, and fig. 
$12). We therefore refer to this molecular 
state as the collided expressome. The previ- 
ous reconstruction was resolved to 7.6 A, and 


Fig. 3. Structural models of the collided expressome. (A and B) Cryo-EM map and model of the collided 
expressome. (C) Schematic cross section indicating three regions of close contact between RNAP and 
ribosome (indicated by dashed rectangles). (D) Details of the interaction interfaces of RNAP with the 
ribosome. Rectangles 1 to 3 correspond to the dashed rectangles in (C). 
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our improved model allowed us to define the 
interaction surfaces of RNAP and the ribo- 
some in even more detail. 

Four regions are in close proximity: uS10 
with the NTD of the RNAP a1 subunit, uS3 
with RNAP subunits a1 and the B-flap domain, 
uS4 with B’-ZF, and uS2 with the RNAP w 
subunit (Fig. 3, C and D, and fig. S9, Band C). 
However, density for the w subunit is very 
weak, which suggests that partial or complete 
dissociation occurs upon collision. The contacts 
bury a total surface area of ~3000 A®. However, 
RNAP moves relative to the ribosome, albeit 
less than in the samples previously analyzed 
(fig. S8, B and C). The RNAP-ribosome con- 
tacts are likely transient, so the contact area 
varies. The observed RNAP-ribosome configura- 
tion allows notable structural complementarity 
between the molecular surfaces. 

Rotation of RNAP relative to the ribosome 
beyond the observed position would cause 
steric clashes (Fig. 4A and fig. S11). We hypo- 
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thesize that the architecture of the collided 
expressome is the product of structural com- 
plementarity and the energetically favorable 
minimization of mRNA path length. To test 
this, we generated ~18,000 hypothetical ex- 
pressome models representing an exhaustive 
search of RNAP rotations located about the 
mRNA axis at a series of distances along it 
(2° rotational step size, 0.5-A translational 
step size). After excluding clashing models, 
we found that the shortest mRNA path is 
achieved by the RNAP orientations observed by 
cryo-EM (Fig. 4B). A simple model is therefore 
sufficient to explain the observed orientation 
of RNAP relative to the ribosome: When 
inserting into the mRNA entrance channel 
cavity on the ribosome, RNAP adopts an 
orientation with structural complementarity 
so that the intervening mRNA spans the 
shortest distance. 

We sought to clarify whether expressome 
formation is driven by concurrent binding to 
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the same mRNA or whether specific ribosome- 
RNAP contacts contribute. Copurification of 
RNAP with ribosomes was substantially re- 
duced when the mRNA did not support con- 
current ribosome binding, but RNAP that 
lacked DNA or mRNA entirely (RNAP-core) 
bound ribosomes more stably (Fig. 4C and 
fig. S10, C and D). This observation was pre- 
viously thought to indicate that expressome 
formation is not driven by shared mRNA 
(10, 12). 

To examine this, we imaged samples as- 
sembled without further purification and 
lacking nucleic acid scaffolds (RNAP-core-70S) 
by negative stain electron microscopy (EM). 
No expressomes formed, which suggests that 
their formation is driven by concurrent mRNA 
binding and that direct interactions play minor 
roles. However, we observed at least two alter- 
native RNAP binding sites (Fig. 4D). The sites 
can be described only approximately from this 
data, but one (site I) is consistent with an inter- 
action with ribosomal protein uS2 observed 
in acore RNAP-30S complex (11). Saturation of 
ribosomes with ribosomal protein bS1, which 
has no effect on expressome formation (fig. 
S13A), abolished the occupancy of site I without 
affecting the second site (site II. The addition 
of a nucleic acid scaffold containing just a short 
mRNA (minimal scaffold) abolished occupancy 
of site II only, whereas addition of both (short 
mRNA scaffold and bS1) abolished both (fig. 
$13). A potential biological role has yet to be 
shown, but the existence of additional 70S-RNAP 
contact modes highlights the complexity of their 
interaction. 

Thus, the expressome is mRNA-linked and 
consequently dynamic. A level of structural 
independence may be required to accommo- 
date internal movements that occur during 
the reaction cycle of each complex. Coupling 
by NusG restrains RNAP motions—and hap- 
pens at variable RNAP ribosome distances (fig. 
$12)—but not when they collide (Fig. 4E). Rela- 
tive orientations of the two machineries change 
in prevalence as a function of their separation 
(fig. S12). Notably, translation factor binding is 
compatible with all the observed RNAP ori- 
entations. The role of the presented structures 
in vivo remains to be investigated, and this 
study provides a basis for elucidating the 
role of coupling in gene expression, and its reg- 
ulation by transcription factors and regulatory 
mRNA structures. 
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Structural basis of transcription-translation coupling 


Chengyuan Wang", Vadim Molodtsov'*, Emre Firlar’, Jason T. Kaelber’, Gregor Blaha’, 


Min Su‘*+, Richard H. Ebright!+ 


In bacteria, transcription and translation are coupled processes in which the movement of 

RNA polymerase (RNAP)-synthesizing messenger RNA (mRNA) is coordinated with the movement of 
the first ribosome-translating mRNA. Coupling is modulated by the transcription factors NusG 
(which is thought to bridge RNAP and the ribosome) and NusA. Here, we report cryo-electron 
microscopy structures of Escherichia coli transcription-translation complexes (TTCs) containing 
different-length mRNA spacers between RNAP and the ribosome active-center P site. Structures of 
TTCs containing short spacers show a state incompatible with NusG bridging and NusA binding 
(TTC-A, previously termed “expressome”). Structures of TTCs containing longer spacers reveal a 
new state compatible with NusG bridging and NusA binding (TTC-B) and reveal how NusG bridges 
and NusA binds. We propose that TTC-B mediates NusG- and NusA-dependent transcription- 


translation coupling. 


acterial transcription and bacterial trans- 

lation occur in the same cellular com- 

partment, occur at the same time, and 

are coordinated processes in which the 

rate of transcription by the RNA poly- 
merase (RNAP) molecule synthesizing an mRNA 
is coordinated with the rate of translation by 
the first ribosome (“lead ribosome”) translat- 
ing the mRNA [(-9) but see (J0)]. Data indicate 
that the coordination is mediated by tran- 
scription elongation factors of the NusG/RfaH 
family, which contain an N-terminal domain 
(N) that interacts with RNAP B’ and B subunits 
and a flexibly tethered C-terminal domain (C) 
that interacts with ribosomal protein S10. These 
factors are thought to bridge, and thereby con- 
nect, the RNAP molecule and the lead ribo- 
some (2, 5-9). Further data indicate that the 
coordination is modulated by the transcrip- 
tion elongation factor NusA (17). 

Cramer and colleagues recently reported a 
7.6-A-resolution cryo-electron microscopy 
(cryo-EM) structure of an Escherichia coli 
transcription-translation complex (TTC) termed 
the “expressome,” obtained by halting a transcrip- 
tion elongation complex (TEC) and allowing 
a translating ribosome to collide with the halted 
TEC (12). However, the mRNA molecule in 
the structure was not fully resolved, preclud- 
ing determination of the number of mRNA 
nucleotides between the TEC and the ribosome 
active center in the structure (12), and the 
functional relevance of the structure has been 
challenged due to its genesis as a collision 
complex, and due to its incompatibility with 
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simultaneous interaction of NusG-N with RNAP 
and NusG-C with the ribosome (6-9). Demo e¢ al. 
recently reported an ~7-A-resolution cryo-EM 
structure of a complex of E. coli RNAP and a 
ribosome 30S subunit (73). However, the struc- 
ture did not contain mRNA, did not position 
RNAP close to the 30S mRNA entrance portal, 
and was incompatible with the simultaneous 
interaction of NusG-N with RNAP and NusG-C 
with the ribosome (/3). 

Here, we report cryo-EM structures of E. coli 
TICs containing defined-length mRNA spacers 
between the TEC and the ribosome active- 
center product site (P site), in both the pres- 
ence of NusG and absence of NusG (Fig. 1, figs. 
S1 to $5, and tables S1 and S2). We prepared 
synthetic nucleic acid scaffolds that contained 
(i) DNA and mRNA determinants that direct 
formation of a TEC upon interaction with 
RNAP, (ii) an mRNA AUG codon that enables 
formation of a translation complex having the 
AUG codon positioned in the ribosome active- 
center P site upon interaction with a ribosome 
and tRNA™* and (iii) an MRNA spacer having 
a length (7) of 4, 5, 6, 7, 8, 9, or 10 codons (12, 15, 
18, 21, 24, 27, or 30 nt, respectively) between 
(i) and (ii) (Fig. 1A). We then incubated the nu- 
cleic acid scaffolds with RNAP, with ribosome 
and tRNA™€*t and optionally with NusG and/or 
NusA, and determined structures by single- 
particle reconstruction cryo-EM (see the ma- 
terials and methods). With nucleic acid scaffolds 
having short spacers (2 = 4, 5, 6, 7, or 8), we 
obtained structures matching the expressome 
of (12) (TTC-A; Figs. 1B, left, and 2; figs. S1 to 
83; and table S1). However, with nucleic acid 
scaffolds having longer mRNA spacers (n = 
8, 9, or 10), we obtained structures of a new 
molecular assembly with features strongly 
suggesting that it functionally mediates NusG- 
dependent, NusA-dependent transcription- 
translation coupling in cells (TTC-B; Figs. 1B, 
center and right, 3, and 4; figs. S3 to S8; table 
S1; and movies S1 and S2). 
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TTC-A was obtained with nucleic acid scaf- 
folds having mRNA spacers of 4, 5, 6, 7, or 
8 codons, but not with longer mRNA spacers 
(Figs. 1B, left, and 2; figs. S1 to S3; and table 
S1). TTC-A was obtained in both the absence 
and presence of NusG (figs. S1 to S3 and table 
Sl). EM density maps of 3.7 to 6.3 A resolution 
were obtained (~7 and ~3.5 A local resolution 
for TEC and ribosome, respectively, in the best 
maps), enabling unambiguous rigid-body dock- 
ing of atomic structures of TEC, the ribosome 
30S subunit, the ribosome 50S subunit with 
tRNA in the active-center P site and exit site 
(E site), and, where present, NusG-N, followed 
by manual fitting of residues in the RNAP- 
ribosome interface and in DNA and mRNA 
(Figs. 1B, left, and 2; figs. S1 to S3; and table S1). 

Unexpectedly, in TTC-A, the spatial rela- 
tionship of RNAP relative to the ribosome 
was identical in structures obtained with 
nucleic acid scaffolds having mRNA spacer 
lengths of 4, 5, 6, 7, and 8 codons (fig. SIF). 
High-resolution data for TTC-A revealed that 
differences in mRNA spacer length are accom- 
modated through differences in extents of 
compaction of mRNA in the RNAP RNA-exit 
channel and RNAP-ribosome interface (Fig. 2B). 
As mRNA spacer length increases from 4 to 5 
to 6 codons, the number of mRNA nucleotides 
in the RNAP RNA-exit channel and RNAP- 
ribosome interface increases from 7 nt (5 nt in 
exit channel; 2 nt in interface) to 10 nt (7 nt 
in exit channel; 3 nt in interface) to 13 nt (9 nt 
in exit channel; 4 nt in interface) (Fig. 2B, 
subpanels 1 to 3), respectively. When the mRNA 
spacer length increases to 7 or 8 codons, 16 or 
19 nt of mRNA, respectively, are accommodated 
in the RNAP RNA-exit channel and RNAP- 
ribosome interface, and the 16 or 19 nt of mRNA 
show disorder, indicating that they adopt an 
ensemble of different conformations (Fig. 2B, 
subpanels 4 and 5). The volume of the RNAP 
RNA-exit channel and RNAP-ribosome inter- 
face cannot accommodate more than ~19 nt 
of mRNA without changing the conforma- 
tion of the former or disrupting the latter. We 
suggest that this accounts for our observa- 
tions that TTC-A is obtained at relatively low 
particle populations with a nucleic acid scaf- 
fold having an mRNA spacer length of 8 codons 
(18% versus 91% for a nucleic acid scaffold 
having an mRNA spacer length of 4 codons; 
figs. S1 and S3) and is not obtained with nu- 
cleic acid scaffolds having mRNA spacer lengths 
>8 codons (figs. S4 and S5). The mRNA spacers 
analyzed in this work contained only U (Fig. 
1A); because U is the RNA nucleotide having 
the smallest volume, the mRNA spacer length 
cut-off of 8 codons observed in this work is 
likely to represent an upper bound. 

In TTC-A, the RNAP-ribosome interface is 
extensive (3742 A® buried surface area) and 
involves contacts of RNAP 8’ zinc-binding do- 
main (ZBD), RNAP 8 flap, and RNAP o! with 
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ribosomal proteins S4, S3, and S10, respec- 
tively (Fig. 2, C and D). 

In EM density maps of TTC-A, density is 
absent for the RNAP w subunit, indicating 
that this subunit is either absent, at a low oc- 
cupancy level, or disordered (Fig. 2E). Molec- 
ular modeling suggests that, if RNAP w were 
present and fully folded, then the C-terminal 
a-helix of m would clash with the ribosome 
(Fig. 2E). 

In EM density maps of TTC-A obtained in 
the presence of NusG, EM density is present 
for NusG-N (residues 1 to 118) at its expected 
binding location on the RNAP §’ clamp helices 
and the RNAP 8 pincer tip [(/4); Fig. 1B, left, and 
fig. SIH], but is absent for the NusG linker and 
NusG-C, consistent with unrestricted motion 
of the linker and NusG-C relative to NusG-N 
[U4); Fig. 1B, left]. Density maps for TTC-A 
obtained in the absence of NusG are identi- 
cal to those obtained in the presence of NusG, 
except that density for NusG-N is missing 
(fig. S2). Model building indicates that the 
shortest sterically allowed distance between 
NusG-N bound to RNAP and NusG-C mod- 
eled as bound to its molecular target on the 
ribosome, ribosomal protein S10 (2, 5-9), is 
160 A in TTC-A; this is 1.9 times the maxi- 
mum length of the NusG linker, indicating that 
TIC-A is incompatible with NusG bridging of 
RNAP and S10 (fig. S9A). 

Molecular modeling indicates that TTC-A is 
also incompatible with other known func- 
tional properties of transcription elongation, 
pausing, and termination in EF. coli. TTC-A is 
sterically incompatible with binding of NusA 
[15); fig. SIOA], formation of a 21-Q antiter- 
mination complex [(/6, 17); fig. SIOB], and 
formation of pause and termination RNA 
hairpins [(15, 18, 19); fig. S10, C and D]. TTC-A 
also appears to be incompatible with ribo- 
some 30S head swiveling, the 21° rotation of 
the ribosome 30S head relative to the ribo- 
some 30S body that occurs during ribosome 
translocation [(20—22); fig. SIA and movie 
$3]. The RNAP-ribosome interface in TTC-A 
spans the 30S head and 30S body in the un- 
swiveled state (Fig. 2C and fig. S11A, left) and is 
expected to be disrupted upon swiveling (loss 
of 1972 A” buried surface area; fig. S11A, 
right). The finding that TTC-A, the “expres- 
some” of (72), lacks the RNAP w subunit, is 
incompatible with NusG bridging, and is in- 
compatible with known functional properties 
of transcription and translation in E. coli indi- 
cates that TTC-A is unlikely to be functionally 
relevant to transcription-translation coupling 
under most conditions in E£. coli. We propose 
that TTC-A is either (i) a specialized complex 
that mediates transcription-translation cou- 
pling under specialized circumstances (e.g., 
transcription-translation coupling by RNAP 
deficient in w or by ribosomes inactive in trans- 
location) or (ii) an anomalous complex formed 
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Fig. 1. Structure determination of TTCs. (A) Nucleic acid scaffolds. Each scaffold comprises nontemplate- 
and template-strand oligodeoxyribonucleotides (black) and one of seven oligoribonucleotides having a 
spacer length n of 4, 5, 6, 7, 8, 9, or 10 codons (red), corresponding to mRNA. Dashed black box labeled 
“TEC” denotes the portion of nucleic acid scaffold that forms the TEC upon addition of RNAP (28 nt 
nontemplate- and template-strand DNA segments comprising an upstream duplex, a “transcription bubble,” 
and a downstrean duplex; 9 nt of mRNA engaged with template-strand DNA as an RNA-DNA “hybrid”; 

and 5 nt of mRNA, on the diagonal, in the RNAP RNA-exit channel); dashed black lines labeled “ribosome 
P-site” denote the mRNA AUG codon intended to occupy ribosome active-center P site upon addition of the 
ribosome and tRNA™®. “spacer” denotes the mRNA spacer between the TEC and the AUG codon in the 
ribosome active-center P site. (B) Cryo-EM structures of NusG-TTC-A (obtained with spacer lengths of 

4 to 8 codons), NusG-TTC-B (obtained with spacer lengths of 8 to 10 codons), and NusA-NusG-TTC-B 
(obtained with spacer lengths of 8 to 10 codons). Structures shown are NusG-TTC-A (3.7 A; n = 4; table S1), 
NusG-TTC-B (4.7 A; n = 9; table $1), and NusA-NusG-TTC-B2 (3.5 A; n = 8; table $1). Images show EM density 
(gray surface) and fit (ribbons) for TEC, NusG, and NusA (at top; direction of transcription, defined by 
downstream-duplex DNA, indicated by arrow in left panel and directly toward viewer in center and right 
panels) and for ribosome 30S and 50S subunits and P- and E-site tRNAs (at bottom). RNAP B', B, al, al’, 
and w subunits are in pink, cyan, light green, and dark green, and gray, respectively; 30S subunit, 50S 
subunit, P-site tRNA, and E-site tRNA are in yellow, gray, green, and orange, respectively; DNA nontemplate 
strand, DNA template strand, and mRNA are in black, blue, and brick red (brick-red dashed line where 
modeled), respectively. NusG, NusA, and ribosomal protein S10 are in red, light blue, and magenta, 
respectively. The ribosome L7/L12 stalk is omitted for clarity in this and all subsequent images. 


when the mRNA spacer between RNAP and 
ribosome is anomalously short (e.g., “collision- 
ome” or “crash-ome”). 

TTC-B was obtained with nucleic acid scaf- 
folds having mRNA spacer lengths of 8, 9, or 
10 codons but not with shorter mRNA spacers 
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(Figs. 1B, 3, and 4; figs. S3 to S7; and table S1). 
TTC-B was obtained only when NusG was 
present (figs. S3 to S8 and table S1) and was 
obtained both without and with bound NusA 
(figs. S3 to S7 and table S1). TTC-B differs from 
TTC-A by translation of RNAP relative to the 
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NusG-TTC-A 


Fig. 2. Cryo-EM structure of NusG-TTC-A. (A) Structure of NusG-TTC-A (3.7 A; 
n = 4; table S1). Two orthogonal views are shown. Colors are as in Fig. 1B. 

(B) Accommodation of mRNA spacer lengths of 4, 5, 6, 7, and 8 codons in NusG- 
TTC-A. EM density, blue mesh; mRNA, brick red (disordered mRNA nucleotides 
indicated by dashed ovals); template-strand DNA in the RNA-DNA hybrid, blue; 
RNAP active-center catalytic Mg**, purple sphere: tRNA in ribosome P site, 
green. Upper and lower black horizontal lines indicate the edges of RNAP and the 
ribosome. (C) RNAP-ribosome interface in NusG-TTC-A (n = 4; identical interface 
for n = 5, 6, 7, or 8), showing RNAP B' ZBD (pink; Zn** ion as black sphere), 
RNAP 8 flap (cyan) RNAP & flap tip helix (8 FTH; disordered residues indicated by 
cyan dashed line), and RNAP a! (green) interacting with ribosomal proteins $4 


(forest green), S3 (orange), and S10 (magenta) and with mRNA (brick red). 
Portions of RNAP B' and ribosome 30S not involved in interactions are shaded 
pink and yellow, respectively. (D) RNAP-ribosome interactions involving RNAP 
B' ZBD and S4 (subpanel 1), RNAP § flap and S3 (subpanel 2; 8 FTH, dashed 
line; B and S3 residues that interact with mRNA, cyan and orange spheres with 
red outlines; mRNA, brick red), RNAP a! and S3 (subpanel 3), and RNAP a! 
and S10 (subpanel 4). Other colors are as in (C). (E) Absence of EM density 
for the RNAP w subunit. EM density, blue mesh; atomic models for RNAP B' and 
S2, pink ribbon and forest-green ribbon, respectively; location of missing 
EM density for w, dashed oval; w in TEC in absence of ribosome [PDB 6P19; (17)], 
white ribbon. 


ribosome by ~70 A and rotation of RNAP rela- 
tive to the ribosome by ~180° (Fig. 1B and movie 
Sl). EM density maps at 3.1 to 12.6 A resolution 
were obtained (~7 and ~3 A local resolution for 
TEC and ribosome, respectively, in the best 
maps), enabling unambiguous rigid-body dock- 
ing of atomic structures of components, fol- 
lowed by manual fitting (Figs. 3 and 4 and figs. 
S3 to $7). TTC-B is identical to the NusG-bridged 
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complex reported in a preprint by Weixlbaumer 
and colleagues (23) and is different from the 
NusA-containing complex reported in a pre- 
print by Mahamid and colleagues (24). 
Unlike in TTC-A, where the RNAP RNA-exit 
channel is coupled directly to the ribosome 
mRNA-entrance portal, in TTC-B, the RNAP 
RNA-exit channel is separated by ~60 A from 
the ribosome mRNA-entrance portal (Fig. 1B). 
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In TTC-B, a ~60-A, ~11-nt mRNA segment 
connects the RNAP RNA-exit channel and the 
ribosome mRNA-entrance portal, running along 
the surface of the ribosome 30S head, making 
favorable electrostatic interactions with posi- 
tively charged residues in ribosomal protein 
S3 and RNAP ’ ZBD (Figs. 3B and 4B and figs. 
S3F and S4G). The requirement for this ad- 
ditional ~11-nt mRNA segment accounts for 
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A NusG-TTC-B 


Fig. 3. Cryo-EM structure of NusG-TTC-B. (A) Structure of NusG-TTC-B 
(4.7 A; n = 9; table S1). Views and colors are as in Fig. 2A. (B) Accommodation 
of mRNA spacer lengths of 8, 9, and 10 codons in NusG-TTC-B. EM density, 
blue mesh; mRNA, brick red (disordered mRNA nucleotides indicated by 
dashed ovals); template-strand DNA in RNA-DNA hybrid, blue; RNAP active- 
center catalytic Mg**, purple sphere; tRNA in ribosome P site, green: 
ribosomal protein S3, orange (positively charged residues positioned to 
contact mRNA as orange spheres); RNAP B' ZBD (pink; Zn°* ion as black 
sphere; positively charged residues positioned to contact mRNA as pink 


the fact that TTC-B is obtained only with nu- 
cleic acid scaffolds having mRNA spacer lengths 
=8 codons. 

In TTC-B, the spatial relationship of RNAP 
relative to the ribosome is identical in struc- 
tures obtained with mRNA spacer lengths of 
8, 9, and 10 codons (figs. S4F and S5G). An- 
alogously to in TTC-A, in TTC-B, differences 
in mRNA spacer length are accommodated 
through differences in extents of compaction 
of mRNA in the RNAP RNA-exit channel (Figs. 
3B and 4B). As mRNA spacer length increases 
from 8 to 9 to 10 codons, the number of mRNA 
nucleotides in the RNAP RNA-exit channel 
increases from ~8 to ~11 to ~14 nt, respectively 
(disordered in each case; Figs. 3B and 4B). 
Assuming that the volume of the RNAP RNA- 
exit channel allows it to accommodate up to 
~15 nt of mRNA (see above), it seems likely 
that mRNA spacer lengths up to ~10 to 11 
codons could be accommodated in TTC-B. 
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Noting that the mRNA segments in the RNAP- 
ribosome interface and near ribosomal pro- 
tein S3 in TTC-B are solvent accessible, it also 
seems possible that longer—possibly much 
longer—mRNA spacer lengths could be accom- 
modated by looping of, or secondary structure 
formation in, these mRNA segments. 

In TTC-B, the interaction between RNAP 
and the ribosome is small, involving only con- 
tact between the RNAP B’ ZBD sequence and 
ribosomal protein $3 (224-A? buried surface 
area; Figs. 3, C and D, and 4, C to E). 

In TTC-B, the RNAP-ribosome interaction is 
supplemented by bridging of RNAP and the 
ribosome by NusG, involving simultaneous 
binding of NusG-N to RNAP and binding of 
NusG-C to ribosomal protein S10 (1409-A? 
buried surface area for NusG-C and S10; Figs. 
1B, 3, A, C, and D, and 4, A and C to E; and figs. 
S3G, S4H, and S9, B and C). NusG-C interacts 
with S10 in the manner expected from pub- 
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spheres). Upper and lower black diagonal lines indicate the edges of RNAP 
and the ribosome. (©) RNAP-ribosome interface and NusG bridging in 
NusG-TTC-B (n = 9; identical interface for n = 8, 9, or 10). RNAP p' ZBD 
(pink; Zn?* ion as black sphere) interacts with ribosomal protein S3 (orange) 
and mRNA (brick red). NusG (red) bridges RNAP and the ribosome, with 
NusG-N interacting with RNAP and NusG-C interacting with ribosomal protein 
S10 (magenta). Portions of RNAP 8’, B, and ribosome 30S not involved in 
interactions are shaded pink, cyan, and yellow, respectively. (D) As in (C), 
showing cryo-EM density as blue mesh. 


lished structures of a complex of NusG and S10 
and of a complex of NusG and a ribosome [(2, 9); 
figs. S8G and S4H). EM density maps show 
unambiguous density for NusG-N, NusG-C, 
and most residues of the NusG linker (Figs. 
3D and 4D and figs. S83G and S4H), and, at 
lower contour levels, show density for all 
residues of the NusG linker (Figs. 3C and 
4C). Corresponding EM maps obtained in 
the absence of NusG do not show TTC-B 
(fig. S8), indicating that NusG bridging is 
functionally important for the formation 
and/or stability of TTC-B. The NusG bridging 
hypothesized in (2) and (9) is thus unequivocally 
verified. 

We first obtained structures of TTC-B in the 
presence of NusG and absence of NusA (NusG- 
TTC-B; Fig. 3, figs. S38 and S4, and table S1). 
Molecular modeling indicated that NusG-TTC-B 
potentially could accommodate binding of NusA 
(fig. S10A). Therefore, we sought, and obtained, 
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A NusA-NusG-TTC-B 
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Fig. 4. Cryo-EM structure of NusA-NusG-TTC-B. (A) Structure of NusA-NusG- 
TTC-B (NusA-NusG-TTC-B2; 3.5 A: n = 9; table S1). NusA, light blue. Views and other 
colors are as in Figs. 2A and 3A. (B) Accommodation of mRNA spacer lengths 

of 8, 9, and 10 codons in NusA-NusG-TTC-B. Views and colors are as in Fig. 3B. 
(C) RNAP-ribosome interface, NusG bridging, and NusA binding in NusA-NusG- 
TTC-B (n = 9; identical interface for n = 8, 9, or 10). RNAP B' ZBD (pink; Zn** ion as 
black sphere) interacts with ribosomal protein S3 (orange) and mRNA (brick red). 
NusG (red) bridges RNAP and ribosome, with NusG-N interacting with RNAP and 
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NusG-C interacting with ribosomal protein S10 (magenta). NusA (light blue) 
KH1 domain interacts with ribosomal proteins S5 and S2 (brown and forest 
green, respectively). Portions of RNAP 8’, 8, w, and ribosome 30S not involved in 
interactions are shaded pink, cyan, gray, and yellow, respectively. (D) As in (C), 
showing cryo-EM density as blue mesh. (E) RNAP-ribosome interactions in- 
volving RNAP B' ZBD and S3 (subpanel 1) and NusG-ribosome interactions involving 
NusG-C and S10 (subpanel 2). (F) NusA-ribosome interactions involving NusA KH1 
and S5 and S2 (subpanel 1) and NusA-RNAP interactions involving NusA-N and 
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RNAP B FTH (subpanel 2; B FTH residue that interacts with mRNA, cyan sphere with 
red outline; mRNA, brick red), NusA AR2 and RNAP aCTD! (subpanel 3), and NusA- 
N and RNAP aCTD" (subpanel 4). (G) Points of flexibility in NusA-NusG-TTC-B 

(NusA coupling pantograph): flexible linkage in NusA structure (AR1-AR2 linker; light 


corresponding structures of TTC-B in the pres- 
ence of both NusG and NusA (NusA-NusG- 
TTC-B; Fig. 4, fig. S5, and table $1). Compared 
to structure determination of TTC-B in the ab- 
sence of NusA, structure determination in the 
presence of NusA was associated with sub- 
stantially higher particle populations (4% ver- 
sus 45% for n = 8, 18% versus 28% for n = 9, 
and 17% versus 40% for n = 10) and substan- 
tially higher resolutions (12.6 A versus 3.1 A for 
n = 8, 4.7 A versus 4.2 A for n = 9, and 5.0 A 
versus 3.7 A for n = 10), indicating that NusA 
functionally stabilizes TTC-B. Three NusA-NusG- 
TIC-B subclasses were obtained: TTC-B1, TTC- 
B2, and TTC-B3, differing by up to 15° rotation of 
RNAP relative to NusA and ribosome (figs. S5 
and $7, A and B, and movie S2). 

In all NusA-NusG-TTC-B subclasses, RNAP 
and NusG interact with the ribosome 30S head, 
with RNAP §’ ZBD contacting ribosomal pro- 
tein S3 and NusG contacting ribosomal pro- 
tein S10 (Fig. 4, C to E, and fig. S6), essentially 
as in the absence of NusA (Fig. 3, C and D). 

In all NusA-NusG-TTC-B subclasses, NusA 
makes identical—and extensive—interactions 
with the surface of the ribosome S30 body, 
involving contacts between NusA KH1 domain 
and ribosomal proteins S2 and S5 (1755-A” 
buried surface area; Fig. 4, C to E, and fig. S6). 
The NusA-ribosome interactions observed 
here show no similarity to the putative NusA- 
ribosome interactions reported in (24); the 
orientation of NusA relative to the ribosome 
differs by ~180°% and the interactions involve 
a different module of the ribosome 30S sub- 
unit (body versus head). 

NusA functions in this context as a large, 
70 x 50 A, open rectangular frame that con- 
nects RNAP to the ribosome 30S body (Fig. 4G 
and fig. S7C). One side of the NusA rectangular 
frame interacts with the ribosome 30S body, 
and three corners of the NusA rectangular 
frame interact with RNAP, contacting the RNA 
a! C-terminal domain (aCTD"), the RNA o” C- 
terminal domain (aCTD"), and the RNAP 8 
flap-tip helix (FTH) (Fig. 4, F and G, and fig. 
S7C). The NusA rectangular frame contains an 
internal flexible linkage, the AR1-AR2 linker 
(ight blue circles in Fig. 4G and fig. S7C), and 
interacts with RNAP through three flexibly 
linked modules: aCTD' and oCTD", which 
are connected to the rest of RNAP through 
long, flexible linkers [(25); lines in Fig. 4G 
and fig. S7C], and 8 FTH, which is connected 
to the rest of RNAP through flexible connec- 
tors [(15-18); black circle in Fig. 4G and fig. 
S7C]. The internal flexibility and flexible con- 
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nections enable the NusA-RNAP subcomplex 
to maintain constant contact with the ribo- 
some 30S body despite differences in the ori- 
entation of RNAP relative to the ribosome 30S 
body (fig. S7C and movie 82). We refer to the 
NusA rectangular frame as the “coupling pan- 
tograph,” analogizing it to an electric-railway 
coupling pantograph, the open rectangular 
frame with internal flexibility and flexible 
connections that enables a locomotive to 
maintain constant contact with a power cable 
despite differences in orientation of the loco- 
motive relative to the cable [(26); fig. S7C and 
movie 82]. 

The separation between the RNAP RNA-exit 
channel and the ribosome mRNA entry portal 
in TTC-B, together with the open character of 
the NusA rectangular frame (coupling panto- 
graph) in TTC-B, provides largely unrestricted 
access for transcriptional-regulatory factors to 
bind, and transcriptional-regulatory RNA sec- 
ondary structures to form, at and adjacent to 
the mouth of the RNAP RNA-exit channel 
(fig. S10). Molecular modeling indicates that 
TTC-B, unlike TTC-A, can accommodate for- 
mation of the 21-Q antitermination complex 
[U6, 17); fig. SIOB) and can accommodate 
formation of pause and termination RNA hair- 
pins [(J5, 18, 19); fig. S10, C and D]. In NusA- 
NusG-TTC-B, positively charged residues of the 
NusA N and SI domains are positioned to make 
favorable electrostatic interactions with the 
hairpin loop of a pause or termination RNA hair- 
pin, and thereby potentially to nucleate for- 
mation of a pause or termination RNA hairpin 
[(15); fig. S7D]. The different orientations of 
NusA N and S1 domains in NusA-NusG-TTC-B 
subclasses B1, B2, and B3 possibly enable in- 
teractions with different-length pause and ter- 
mination RNA hairpins, with B1 accommodating 
shorter hairpins and B2 and B3 accommodat- 
ing longer hairpins (fig. S7D). 

Molecular modeling also indicates that TTC-B, 
unlike TTC-A, is compatible with ribosome 
30S head swiveling, the rotation of the 30S 
head relative to the 30S body that occurs during 
ribosome translocation [(20—22); fig. S11, B 
and C, and movies S4 and S5]. In NusG-TTC-B, 
all RNAP-ribosome and NusG-ribosome in- 
teractions involve the ribosome 30S head; 
accordingly, 30S head swiveling can be ac- 
commodated by rotation of RNAP and NusG 
with the 30S head (fig. SIIB, center, and movie 
85) and/or by separate rotation of flexibly con- 
nected RNAP £’ ZBD and flexibly connected 
NusG-C with the 30S head (fig. S11B, right, and 
movie S4). In NusA-NusG-TTC-B, NusA-ribosome 
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blue circle), three flexible linkages between NusA and RNAP (aCTD! linker, aCTD! 
linker, and B FTH connectors; black lines and black circle), flexible linkage 
between RNAP and ribosome (8' ZBD connectors; black circle), and flexible NusG 
bridging of RNAP and ribosome (NusG linker; red circle). 


interactions involve the ribosome 30S body, and 
RNAP-ribosome and NusG-ribosome interactions 
involve the ribosome 30S head; nevertheless, 
exploiting the internal flexibility and flexible 
connections of the NusA-RNAP coupling pan- 
tograph, 30S head swiveling can be accommo- 
dated by rotation of RNAP and NusG with the 
30S head (fig. S11B, center, and movie S5) 
and/or by separate rotation of flexibly con- 
nected RNAP §’ ZBD and flexibly connected 
NusG-C with the 30S head (fig. SIIB, right, and 
movie S5). 

Based on the observation that TTC-B is com- 
patible with NusG bridging, NusA binding, 
known functional aspects of transcription, and 
known functional aspects of translation, we 
propose that TTC-B modulates NusG-dependent, 
NusA-dependent transcription-translation cou- 
pling in E. coli. 

The structures presented were determined 
in the presence of CHAPSO, a nonionic deter- 
gent that has been used extensively in cryo-EM 
structural analysis of RNAP and RNAP com- 
plexes to improve structural homogeneity by 
disrupting nonspecific complexes and weak 
complexes, and by improving rotational orien- 
tation distributions of particles by reducing in- 
teractions with the air-water interface (14-18). 
Analogous structure determination in the ab- 
sence of CHAPSO yielded low-resolution maps 
of TTC-A for nucleic acid scaffolds with mRNA 
spacer lengths of 4, 5, 6, and 7 codons (figs. S12 
and S13, table S2, and movie S6) and of two 
additional complexes, TTC-C and TTC-D, for nu- 
cleic acid scaffolds with mRNA spacer lengths 
of 7, 8, and 9 codons (figs. S13 to S16, table S2, 
and movies S7 to S11). The fact that TTC-C and 
TTC-D are observed only in the absence of 
CHAPSO suggests that TTC-C and TTC-D may 
involve relatively weak interactions. In TTC-C 
and TTC-D, interactions between RNAP and 
ribosome are mediated by RNAP B sequence 
insert 2 [BSI2; also known as fi9 (27, 28)], a 
60-A-long o-helical antiparallel coiled coil 
flexibly tethered to the rest of RNAP, and the 
main interaction is an electrostatic interac- 
tion between the tip of BSI2 and the ribosome 
30S subunit (figs. $15 and S16). In TTC-C, the 
orientation of RNAP relative to the ribosome 
is compatible with NusG bridging (fig. S15), 
and in TTC-D, the orientation of RNAP rela- 
tive to the ribosome is incompatible with NusG 
bridging (fig. S16). The structures suggest that 
TTC-C and TTC-D could play roles in NusG- 
dependent and NusG-independent transcription- 
translation coupling, respectively. The structural 
module that mediates the RNAP-ribosome 
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interaction in TTC-C and TTC-D, BSI2, is not 
essential for growth in rich media (26), but is 
essential for growth in minimal media (26), 
implying that TTC-C and TTC-D are unlikely 
to be important for transcription-translation 
coupling in general, but may be important 
for transcription-translation coupling in spe- 
cific transcription units in specific regulatory 
contexts (6). Further analysis will be needed to 
determine whether, and, if so, in which con- 
texts, TTC-C and TTC-D function in transcription- 
translation coupling in E. coli. 

The results presented here define four struc- 
tural classes of TTCs: TTC-A [the previously 
reported expressome; (12)], TT'C-B, TTC-C, and 
TTIC-D, and show that TTC-B has structural 
properties indicating that it mediates NusG- 
dependent, NusA-dependent transcription- 
translation coupling in E. colt. 

The results presented reframe our under- 
standing of the structural and mechanistic 
basis of transcription-translation coupling. 
The results provide high-resolution structures 
of the previously described expressome [(12); 
TTC-A] that demonstrate its incompatibility 
with general transcription-translation coupling. 
In addition, the results provide high-resolution 
structures of a new structural state, TTC-B, 
with properties assignable to general, NusG- 
dependent, NusA-dependent transcription- 
translation coupling. Our results also show 
that NusG stabilizes TTC-B by bridging RNAP 
and the ribosome 30S head, that NusA sta- 
bilizes TTC-B by bridging RNAP and the ri- 
bosome 30S body, and that NusA serves as a 
coupling pantograph that bridges RNAP and 
the ribosome 30S body in a flexible manner 
that allows rotation of RNAP relative to the 
ribosome 30S body. Finally, the results pro- 
vide testable new hypotheses regarding the 
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identities of the RNAP and NusA structural 
modules crucial for transcription-translation 
coupling (RNAP ’ ZBD and NusA KH1) and 
the interactions made by those structural 
modules (interactions with ribosomal pro- 
tein S3 in the S30 head and interactions 
witih ribosomal proteins $2 and S5 in the S30 
body). 
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ULTRACOLD PHYSICS 


Direct laser cooling of a symmetric top molecule 


Debayan Mitra*}, Nathaniel B. Vilas{, Christian Hallas, Loic Anderegg, Benjamin L. Augenbraun, 
Louis Baum, Calder Miller, Shivam Raval, John M. Doyle 


Ultracold polyatomic molecules have potentially wide-ranging applications in quantum simulation 
and computation, particle physics, and quantum chemistry. For atoms and small molecules, direct 
laser cooling has proven to be a powerful tool for quantum science in the ultracold regime. However, 
the feasibility of laser-cooling larger, nonlinear polyatomic molecules has remained unknown because 
of their complex structure. We laser-cooled the symmetric top molecule calcium monomethoxide 
(CaOCHS), reducing the temperature of ~10* molecules from 22 + 1 millikelvin to 1.8 + 0.7 millikelvin 
in one dimension and state-selectively cooling two nuclear spin isomers. These results demonstrate 
that the use of proper ro-vibronic transitions enables laser cooling of nonlinear molecules, thereby 
opening a path to efficient cooling of chiral molecules and, eventually, optical tweezer arrays of 


complex polyatomic species. 


aser cooling of atomic systems has en- 
abled substantial advances in quantum 
simulation, precision clocks, and quan- 
tum many-body physics (/-4). Extension 
to a diversity of complex polyatomic mol- 
ecules would provide qualitatively new and 
improved platforms for these fields. For ex- 
ample, the parity doublets that result from 
rotations of a molecule around its principal 
axis, a general feature of symmetric top mole- 
cules, give rise to highly polarized states with 
structural features that are greatly desirable 
for both quantum science and precision mea- 
surement (5-7). However, the same complexity 
that provides these advantages makes laser 
cooling of nonlinear polyatomic molecules, 
including symmetric tops, challenging. Recent 
theoretical proposals have nonetheless sug- 
gested that laser cooling of some nonlinear 
molecules is a practical possibility (8-0). 
Laser cooling relies on repeatedly scattering 
photons from an atom or molecule via rapid 
optical cycling, removing energy and entropy 
with directed momentum kicks and sponta- 
neous emission events. Although other direct 
(11-14) and indirect (75, 16) methods of slow- 
ing, cooling, and trapping molecules have 
been used, direct laser cooling has success- 
fully brought a number of diatomic (17-20) 
and linear triatomic (27-23) molecules into 
the submillikelvin regime, with phase-space 
density increases of many orders of mag- 
nitude. The ability to rapidly cycle photons, 
which is essential to laser cooling, naturally 
also allows for efficient quantum state prepa- 
ration and readout (24, 25), which are neces- 
sities for proposed quantum computation and 
simulation platforms using ultracold mole- 
cules, including those proposed for symmetric 
top molecules (5, 6). 
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The established recipe for achieving optical 
cycling and laser cooling of molecules requires 
three key ingredients: strong electronic tran- 
sitions between two fully bound molecular 
states; diagonal Franck-Condon factors (FCFs), 
which limit branching to excited vibrational 
levels; and rotationally closed transitions. Here, 
we Satisfied these conditions for the symmetric 
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top molecule CaOCHy using two distinct op- 
tical cycling schemes to enable rapid scatter- 
ing of photons. We efficiently reduced the 
transverse temperature of a CaOCH; molec- 
ular beam from 22 + 1 mK to 1.8 + 0.7 mK and 
scattered more than 100 photons while using 
only a few lasers, despite the presence of 12 vi- 
brational modes. The state selectivity of the 
cooling is intimately connected to the nuclear 
spin statistics of the molecule as well as to its 
rigid-body angular momentum along the sym- 
metry axis, denoted by the quantum number 
&" in the ground state (Fig. 1, A and D). These 
distinctive features of symmetric top mole- 
cules were not accessible to previously laser- 
cooled diatomic and triatomic molecules. 
We studied laser cooling of both nuclear- 
spin isomers (NSIs) of CaOCHs3, each of which 
corresponds to a specific set of K states. To 
cool the symmetric (ortho) NSI, we laser- 
excited molecules in ground states with K” = 0 
(Fig. 1, A to C). To high order, K” = 0 states 
effectively “freeze out” the particular addi- 
tional complexity of symmetric top molecules 
relative to their linear counterparts, and this 
cooling approach looks similar to that used 
for diatomic and linear triatomic molecules. 


a 


H+ i 


Fig. 1. CaOCH3 laser cooling schemes. (A and D) We use two optical cycling schemes, which target 
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urther details of the optical cycling scheme. 
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molecules differing in their angular momentum quantum number K and their nuclear spin statistics 
(represented by arrows on the hydrogen nuclei). (B and C) For ortho-CaOCHs, we address the Oo, 44, and 3; 
vibrational levels of the X°A,(K" = 0) ground state. Vibrational branching ratios are illustrated by downward 
arrows. Rotational closure is achieved by addressing N" = 1, J" = 1/2, 3/2 ground-state manifolds. The 
total parity of each state is indicated by + and — signs. (E and F) For para-CaOCHs, we address the Og and 4; 
brational levels of the X°A,(K" = 1) electronic ground state, driving transitions from N" = 1 and N" = 2 to 
achieve rotational closure. Each J state contains an unresolved parity doublet denoted by +. See (28) for 
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Because of the favorable FCFs of CaOCHs3, we 
used only three lasers to address the 09 (no 
excitation), 4, (Ca-O stretch), and 3, (O-C 
stretch) vibrational manifolds of the xX 2Ay 
electronic ground state, enabling each mol- 
ecule to scatter an average of 120 photons 
before being lost to other vibrational levels 
(26, 27). To laser-cool the asymmetric (para) 
isomer, we instead excited ground states that 
have K" = 1 (Fig. 1, D to F). In this case, the 
existence of unresolved (opposite) parity dou- 
blets means that full optical closure required 
addressing additional rotational components 
of the X24, ground state. Using three cooling 
and repumping lasers enabled scattering an 
average of 30 photons before molecules were 
optically pumped into a fourth ro-vibronic 
state. Including three additional lasers would 
allow scattering 120 photons on average, 
just as in the ortho-CaOCH; scheme. Be- 
cause both cooling schemes are limited only 
by vibrational decay pathways, scattering 
>1000 photons could be achieved by add- 
ing approximately four additional repumping 
lasers (26, 28). 

Laser cooling of CaOCH3 was accomplished 
using the magnetically assisted Sisyphus effect, 
a highly efficient and robust cooling method 
first demonstrated with atoms (17, 21, 22, 29-31). 
In this method, molecules pass through a stand- 
ing wave of near-resonant, linearly polarized 
light containing all of the optical frequencies 
necessary to establish optical cycling. For blue- 
detuned light (A > 0), as a molecule approaches 
a peak in the periodic, ac Stark-shifted poten- 
tial landscape formed by the standing wave, it 
returns nonadiabatically to a potential mini- 
mum by a combination of optical pumping 
and magnetic state remixing (by an external 
magnetic field |B| ~ 1G aligned at an angle 
0 ~ 45° to the laser polarization axis). Repeated 
iterations of this process lead to cooling. Red- 
detuned light (A < 0) results in heating via a 
similar but inverted mechanism (28). 


A Ocm 35.5cm 37.5cm 48cm 
Cryogenic 
ts 3x3mm 
aperture 6 wB 
<0 { 
Ge 
Ablation laser Cooling Cleanup 
; Region Region 


Fig. 2. Apparatus and beam images. (A) Schematic of the experimental 
apparatus, illustrating the beam source, laser cooling, cleanup, and detection 


regions (not to scale). The cooling region contains a near-resonant standing 
wave generated by retroreflecting a single, linearly polarized Gaussian laser 
beam with a 6-mm 1/e* diameter. In the cleanup region, molecules are 
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The experimental apparatus was similar to 
one described previously (23). A schematic is 
shown in Fig. 2A. Briefly, CaOCH3; molecules 
were produced in a cryogenic buffer gas envi- 
ronment by ablation of a calcium metal target 
in the presence of methanol vapor. The result- 
ing beam had a mean forward velocity of 150 + 
30 m/s and was collimated to a transverse 
temperature of ~22 mK bya3 mm x 3mm 
square aperture immediately preceding the 
cooling region, after which ~10* molecules 
(in a single pulse) remained with a density of 
~10? cm ™®. After laser cooling, the molecules 
propagated ~50 cm and underwent time-of- 
flight expansion in the direction transverse to 
propagation, mapping the momentum distri- 
bution onto the spatial profile of the beam. 
Finally, in the detection region, the molecules 
were addressed with resonant laser light and 
the resulting fluorescence was imaged onto 
an electron-multiplying charge-coupled device 
(EMCCD) camera to extract spatial informa- 
tion, and thus their transverse temperature. 
Figure 2, B to D, shows representative beam 
images of the ortho NSI (K” = 0) for unper- 
turbed, Sisyphus-heated (A < 0), and Sisyphus- 
cooled (A > 0) configurations; these images 
clearly indicate strong optical forces manip- 
ulating the molecular velocity distribution. 

By integrating along the direction of mo- 
lecular beam propagation, we obtained one- 
dimensional (1D) beam profiles, shown in 
Fig. 3 for both the ortho and para NSI cooling 
schemes. The cooled and heated profiles fit well 
to a sum of Gaussian distributions with two 
distinct widths, corresponding to two classes 
of molecules, those that were Sisyphus-cooled 
and those that were not (28). The cooled mole- 
cules were those with transverse velocities less 
than the capture velocity, v < v,. Molecules with 
U > U, Were instead subject predominantly to 
Doppler cooling and heating, depending on 
laser detuning. For blue detuning (A > 0), a 
large fraction of molecules fell within v, and 
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were cooled into a central, narrow peak on top 
of a broad Doppler-heated background. For 
red detuning (A < 0), molecules slower than v, 
were heated while faster ones were Doppler- 
cooled. This competition led to a concentration 
of molecules at velocities where the Doppler 
and Sisyphus forces balanced, correspond- 
ing to approximately v,. From the positions 
of these two peaks, we estimated a capture 
velocity v, ~ 1.5 m/s for K” = 0 cooling of the 
ortho NSI, in good agreement with the model 
described below. Unperturbed and resonantly 
depleted profiles fit well to single Gaussian 
distributions. 

The integrated area of each of the three ortho- 
CaOCHs profiles with 1.1 W/cm? of cooling light 
applied was ~50% that of the unperturbed pro- 
file (Fig. 3A). This effect is understood to be 
due to losses to vibrational states that were 
not repumped, most notably X24,45 and _X? 
A,3)4,8,. Combining the observed depletion 
with branching ratios previously measured for 
CaOCHs3 (26), we determined that 80 (+100, -30) 
photons were scattered in the cooling process 
and 110 (+150, -40) photons were scattered in 
resonant depletion (28). From this observation 
we inferred an average scattering rate of ~2 x 
10° s™ across the cooling region, which is sim- 
ilar to scattering rates observed for laser cool- 
ing of diatomic and linear triatomic molecules 
(18-22, 32). We determined the temperature of 
the molecules by fitting a Monte Carlo simu- 
lation of the molecular beam propagation to 
our data (28). This calculation gave an initial 
transverse temperature 7, = 22 + 1 mK, which 
was reduced by Sisyphus cooling to 7, = 1.8 + 
0.7 mK. Combined with the enhancement in 
on-axis molecule density seen in Fig. 3A, this 
substantial temperature reduction corresponded 
to a factor of 4 increase in the on-axis phase- 
space density of the molecular cloud (33). Fi- 
nally, we varied the laser detuning, intensity, 
and magnetic field for ortho-CaOCH3 Sisyphus 
cooling and compared the results to those of a 
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repumped out of the X°A;3; (ortho-CaOCH3) and X2A,(N" = 2)4; (para-CaOCH3) 
states before being imaged onto an EMCCD camera via laser-induced 


fluorescence (LIF) detection. (B to D) Beam images for ortho-CaOCH3 (K" = 0) 
are shown for unperturbed (B), Sisyphus-heated (A = -15 MHz) (C), and 
Sisyphus-cooled (A = +25 MHz) (D) configurations. 
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Fig. 3. Sisyphus-cooled and Sisyphus-heated beam profiles. (A and B) Integrated laser-induced 
fluorescence (LIF) versus position for ortho-CaOCH3 (K" = 0) (A) and para-CaOCH3 (K" = 1) (B) cooling 
schemes. Sisyphus cooling, at a positive detuning A = +25 MHz for the ortho isomer and A = +20 MHz for the 
para isomer, manifests itself as a narrowing of the detected distribution (blue), whereas Sisyphus heating 
appears as a bimodal distribution (A = -15 MHz for both; red). Unperturbed (cooling lasers off; gray) 

and resonantly depleted (A = 0; purple) profiles have the same width but different integrated area due to 
optical pumping into dark vibrational states. Solid curves are Gaussian fits as described in (28). 


model obtained by solving optical Bloch equa- 
tions for molecules subject to the cooling light. 
We found good agreement between experimen- 
tal results and the model, further strengthening 
our understanding of the cooling mechanism 
involved (28). 

Figure 3B shows beam profiles for para- 
CaOCHs (K” = 1) laser cooling, taken at a laser 
intensity of 250 mW/cm”. We observed nota- 
ble Sisyphus cooling and heating, although 
the effect was weaker than for ortho-CaOCH3; 
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(K" = 0), as expected, because there were more 
ground states coupled to the same excited 
electronic state, leading to a lower scattering 
rate by a factor of ~2 (28). Additionally, the 
reduced laser intensity used, set by techni- 
cal limitations, resulted in a smaller capture 
velocity and cooled fraction. Using the same 
analysis as above, we found that the cooled 
molecules scattered an average of 25 + 10 pho- 
tons, corresponding to an estimated scattering 
rate of ~0.75 x 10° s‘. Cooling of the para NSI 
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would be improved with higher laser intensity 
and/or interaction length. 

Our results demonstrate the feasibility of 
direct laser cooling of complex, nonlinear poly- 
atomic molecules into the millikelvin temper- 
ature regime (34, 35). This work opens the 
door to a number of future experiments that 
span a range of modern physical and chemical 
research frontiers. Because Sisyphus cooling is 
effective down to the recoil limit (36) (corre- 
sponding to ~500 nK for molecules with sim- 
ilar mass to CaOCHs), these techniques could 
be used to achieve bright, highly collimated, 
few-microkelvin molecular beams useful for 
precision measurements and studies of ultra- 
cold chemistry (37). Efficient and state-selective 
laser cooling of both nuclear spin isomers 
also offers a method to separate them using 
radiation-pressure beam deflection of specific 
spin species, a topic of interest in physical 
chemistry (38-41). By adding a small number 
of other laser frequencies to the laser cooling 
of CaOCHs3 (26), optical tweezer arrays of sym- 
metric top molecules should be possible, as 
recently accomplished with diatomic species 
(25). These arrays would offer an ideal start- 
ing point for realizing new polyatomic quan- 
tum simulation and computation platforms 
(5, 6). Laser cooling could also be extended to 
asymmetric tops, including biochemically rele- 
vant chiral molecules (9, 10, 42, 43). Finally, 
laser cooling and trapping of the heavier sym- 
metric top molecule YOOCH3; would allow 
precise searches for time reversal-violating 
interactions at a previously inaccessible energy 
scale, and ultracold chiral molecules such as 
YbDOCHDT could enable precision probes of 
fundamental parity violation (7, 22). 
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Self-limiting directional nanoparticle bonding 
governed by reaction stoichiometry 


Chenglin Yi?, Hong Liu?*, Shaoyi Zhang®, Yiqun Yang’, Yan Zhang’, Zhongyuan Lu2, 


Eugenia Kumacheva*}, Zhihong Nie'*+ 


Nanoparticle clusters with molecular-like configurations are an emerging class of colloidal materials. 
Particles decorated with attractive surface patches acting as analogs of functional groups are used to 
assemble colloidal molecules (CMs); however, high-yield generation of patchy nanoparticles remains 

a challenge. We show that for nanoparticles capped with complementary reactive polymers, a 
stoichiometric reaction leads to reorganization of the uniform ligand shell and self-limiting nanoparticle 
bonding, whereas electrostatic repulsion between colloidal bonds governs CM symmetry. This 
mechanism enables high-yield CM generation and their programmable organization in hierarchical 
nanostructures. Our work bridges the gap between covalent bonding taking place at an atomic level 
and colloidal bonding occurring at the length scale two orders of magnitude larger and broadens 


the methods for nanomaterial fabrication. 


lusters of inorganic nanoparticles (NPs) 
exhibit synergistic properties, owing to 
interactions between surface plasmons, 
excitons, or magnetic moments of the 
constituent NPs (/, 2). Emerging applica- 
tions of NP clusters in plasmonics (3), photon- 
ics (4, 5), and catalysis (6) necessitate precise 
control over their architecture and structural 
complexity; however, as of now, such control 
remains a challenge (7). Assembly of NPs into 
small clusters recapitulating the structure and 
symmetry of molecules—that is, colloidal mol- 
ecules (CMs)—has emerged as a promising 
strategy for generating nanostructures with 
programmable architectures (7, 8). 
Assembly of CMs is governed by the delicate 
balance of attractive forces (e.g., hydrogen 
bonding, Coulombic attraction, or solvophobic 
interactions) and repulsive interactions of elec- 
trostatic or steric origin (9). To achieve direc- 
tionality in interparticle interactions for CM 
assembly, colloidal building blocks are func- 
tionalized with a discrete number of attractive 
surface regions (patches) (J0-14) or packed in 
droplets with subsequent solvent evaporation 
(15). The shortcomings of these approaches, 
especially for nanometer-sized particles, are 
low yield and a limited precision of CM fab- 
rication (14, 16, 17). For nanoscale particles, 
notable progress has been achieved for DNA- 
mediated assembly of CMs from NPs func- 
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tionalized with complementary single-stranded 
DNA (13, 18-20). However, for shape-isotropic 
NPs, the surface attachment of DNA is not 
regiospecific and NP bonding is not localized to 
a specific surface region, thus the self-assembly 
yields a mixture of CMs with different struc- 
tures (21, 22). Enhanced control over CM ar- 
chitecture requires extra steps of regiospecific 
patterning NPs with attractive DNA patches 
(23), or encapsulating NPs within predesigned 
DNA frames with programmed bonding sites 
(24, 25). 

We show that high-yield directional and self- 
limiting assembly of spherical NPs uniformly 
coated with polymer ligands can be achieved 
without engineering site-specific NP interac- 
tions. The concept of directional NP bonding 
is built on the mechanism of o-bond formation 
by hybridization of atomic orbitals in mol- 
ecules. When the valence shell is completely 
filled, atoms in a molecule do not form new 
bonds, and the repulsion between the pairs of 
valence electrons determines the structure of 
the molecule (26). As an example, Fig. 1A shows 
a trigonal planar molecule of boron trifluoride 
(BF;), in which sp? hybridization of atomic or- 
bitals of boron yields three symmetric singly 
occupied orbitals that bond with orbitals of 
fluorine atoms. 

We used an acid-base neutralization reac- 
tion between the ligands capping two types of 
NPs. Figure 1B shows schematically the forma- 
tion of a CM with a BF; molecular configuration. 
Two populations of inorganic NPs (designated 
as NP-A and NP-B), each tethered with distinct 
block copolymer ligands, mimic atoms of two 
elements. Each copolymer contains a NP- 
adjacent reactive block with complementary 
acid or base groups and an outer block act- 
ing as a steric stabilizer of the NPs. A typical 
combination of copolymer ligands used in the 
present work was poly(ethelyene oxide)-b- 
(acrylic acid-r-styrene) [PEO-b-P(AA-7-St)] for 
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NP-As and poly(ethelyene oxide)-b-poly(V,N- 
dimethylaminoethyl methacrylate-r-styrene) 
[PEO-b-P(DMAEMA-?-St)] for NP-Bs (figs. S1 
to S6, tables S1 to S5, and supplementary text 
section I). Partial deprotonation of acid groups 
of the ligands on the NP-A makes the particle 
negatively charged (table S6). After mixing of 
NP-As and NP-Bs, a neutralization reaction 
between the acid and base groups in the co- 
polymer ligands leads to directional colloidal 
bonding. Unlike a reaction between two hard 
spheres that is limited to their contact zone, the 
acid and base groups on the NP-A and NP-B 
pair can access each other, owing to the flex- 
ibility of the copolymer molecules. The stoichi- 
ometry and reversibility of the reaction be- 
tween the ligands would regulate the number 
of bonds between the NP-As and NP-Bs by 
means of a self-limiting mechanism, that is, 
when the reactive groups in the ligand shell 
are completely consumed, no further bonding 
will take place. Furthermore, the Coulombic 
repulsion between the colloidal bonds con- 
taining charged groups would govern the CM 
symmetry. 

Spherical gold NPs with mean diameter, D, 
in the range from 10 to 40 nm were end-grafted 
with brush-like copolymer ligands and dis- 
persed in tetrahydrofuran, a good solvent for 
the copolymers (figs. S7 and S8) (27). The NP-As 
and NP-Bs had electrokinetic potentials of 
—65.18 + 6.56 and 5.30 + 9.54 mV, respectively 
(table S6). After mixing of the solutions of 
NP-As and NP-Bs, the neutralization reac- 
tion between the ligands took place (figs. S9 
and S10). Within 3 to 5 min, the NP-As and 
NP-Bs assembled into clusters with AB, (40%), 
ABz (30%), and AB (12%) structures (Fig. 1, C 
and D). In the AB, clusters, two NP-Bs pref- 
erentially attached at two opposite poles of a 
NP-A (Fig. 1C, top). The average bonding angle, 
2B; AB,;, formed by NP-A, NP-B; and NP-B; (2 = 7) 
was 151.2° + 19.6° As time evolved, the frac- 
tion of AB, structures gradually increased at 
the expense of the reduced fractions of AB 
and AB, clusters and free NPs (Fig. 1D). After 
150 min, ~80% of the clusters had an AB; 
structure (Fig. 1C, bottom), and the average 
B;AB; became 135.4° + 13.2°. The AB, to ABs 
transition was caused by the formation of the 
new third bond between the NP-A and NP-B 
and the rearrangement of the two preexisting 
bonds between NP-A and NP-Bs, reflected by 
the pronounced increase in the fraction of 
~120° bonding angles and the decrease in 
the population of free NP-Bs (Fig. 1E and fig. 
S11). In the 6 hours after the beginning of as- 
sembly, the AB; clusters further optimized 
their structure to achieve a stable bond length 
and a symmetric trigonal planar geometry 
with an average 2B;AB; of 119.8° + 12.8° (figs. 
$12 and S13). The importance of electrostatic 
repulsion between the colloidal bonds was 
signified by the decrease in AB; symmetry in 
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Fig. 1. Directional bonding of distinct NPs. (A) Schematic illustration of boron 
(B) and fluoride (F) atoms and the structure of a sp*-hybridized BF; molecule. 
(B) Illustration of directional bonding of NP-A and NP-B, each coated with 
distinct block copolymer ligands, to form an AB3 structure by the stoichiometric 
reaction between the complementary reactive groups in the ligands. In the 
polymer formula, m and n correspond to the number of repeat units, whereas 
a. and B represent content (%) of each comonomer in the randomly 
copolymerized block. Me, methyl. (©) Representative scanning electron 


the presence of LiBr salt (fig. S14). The NP 
bonding process to form AB; CMs followed a 
pseudo first-order reaction mechanism (fig. S15). 

The evolution of 2B;AB; and the distance 
d,_p, between the NP-A and NP-B; in the AB; 
structures was assessed with coarse-grained 
Brownian dynamics simulations coupled with 
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the stochastic reaction model (movie S1, table 
87, and supplementary text section II) (28). 
Although neutralization of weak acids and 
bases is reversible (29), multiple reaction sites 
prevented bonded NP-As and NP-Bs from de- 
tachment, as indicated by an approximately 
monotonic decrease in d4_p, during the bond- 
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microscopy (SEM) images of NP assemblies formed 3 min (top) and 150 min 
(bottom) after mixing the NP-As and NP-Bs. (D) Evolution of CMs with AB, 
structure (x is the number of NP-Bs) during the course of assembly. The star and 
arrow symbols are provided to guide the eye. (E) Distribution of the bond 
angles, 2B;AB;, in the CMs shown in (C). (F) Simulation snapshots illustrating 
the kinetics of the formation of AB3 CMs at three different time points, ty, te, 
and t3. The PEO blocks in the ligands are not shown for ease of visualization. In 
(D) and (E), 2000 NPs are analyzed. 


ing process (figs. S16 and S17). When the reaction 
reached a dynamic equilibrium at a consump- 
tion of ~60 to 80% of acid-base moieties, a 
stable bond formed (table S8). Figure 1F illus- 
trates representative simulation snapshots 
of the evolution of AB, structures. After bond- 
ing the first NP-B to the central NP-A to form 
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Fig. 2. Experimental and simulation study of the CM formation. (A) Simulation 
snapshots and SEM images of AB, structures (x = 1 to 6) assembled from 36-nm 
Au NP-As and 20-nm Au NP-Bs. The M,, of the P(AA-r-St) and P(DMAEMA-r-St) 


blocks are, from left to right, 41.9 and 58.3 kg/mol (I), 31.1 and 58.3 kg/mol (Il), 


40.4 and 35.1 kg/mol (Ill), 40.4 and 29.3 kg/mol (IV), 40.4 and 20.1 kg/mol 
(V), and 40.4 and 8.9 kg/mol (VI). (B) Distribution of CMs with different struc- 
tures in simulation (Sim.) and experiments (Exp.), corresponding to | to VI in (A). 


3.5 7 


1357 


The error bars indicate standard deviation. (© and D) SEM images of AB> clusters 
formed from 36-nm Au NP-As and 26-nm Au NP-Bs (C), and 36-nm Au NP-As 
and 42-nm Au NP-Bs (D). (E and F) SEM images of CMs formed from 22-nm 
Fe304 NP-As and 15-nm Au NP-Bs (E), and 15-nm Au NP-As and 25-nm Ag 
NP-Bs (F). The M,, of P(AA-r-St) and P(DMAEMA-r-St) blocks are 43.2 and 
58.3 kg/mol (C), 3L1 and 20.1 kg/mol (D), 22.1 and 39.4 kg/mol (E), and 

51.6 and 20.1 kg/mol (F). Scale bars are 50 nm in (A), and 100 nm in (C) to (F). 


an AB cluster, the second NP-B preferentially 
attacks the NP-A at a more reaction-active 
site, that is, opposite to the first bonded NP-B. 
These two NP-Bs align at a 2B;AB; close to 
180° to minimize the Coulombic repulsion be- 
tween the charged groups within the bonds. 
After bonding the third NP-B to the NP-A, 
attachment of other NP-Bs or NP-As to the 
ABs structure is limited, owing to the reaction 
stoichiometry, Coulombic repulsion between 
the CM and free NP-As, and steric constraints 
imposed by the PEO blocks and reacted frag- 
ments of the copolymers (fig. S18). Three 
bonds further rearrange to reach a 2B;AB; of 
120°, corresponding to a trigonal planar shape 
of the CM. Once assembled, the CM retains its 
structure, although bond angles and lengths 
fluctuate with time. Furthermore, the simula- 
tion shows that the assembly of NP-Bs with 
NP-As without ionization of the acid groups in 
ligands yields AB; clusters with reduced sym- 
metry (fig. S19). 

We generated a library of CMs with differ- 
ent structures by varying the number average 
molecular weight (M,,) of the PDMAEMA- 
r-St) block on NP-Bs from 8.9 to 58.3 kg/mol 
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(corresponding to the root-mean-square end- 
to-end distance of the unperturbed copolymer 
chain from 8.14 to 25.13 nm). Figure 2A and 
figs. S20 to S26 show CM structures obtained 
in experiments and simulations (tables S7 to 
89). The CMs had a general formula AB,, in 
which x number of NP-Bs were bonded to the 
central NP-A (where w is a positive integer of 
1< <6). To achieve a high yield of CMs with 
a targeted architecture, the feeding number 
ratio of NP-Bs to NP-As was equal to x in the 
AB, structure. With a decrease in the length 
and hence the number of base groups of the P 
(DMAEMA-?-St) block on NP-B, 2 increased 
from 1 to 6, because a larger number of NP-Bs 
were required to react with acid groups on 
NP-A (table S9). The CMs had the structure 
of binary molecular compounds—namely, a 
linear HF and BeFs, trigonal planar BFs, tetra- 
hedral CH,, trigonal bipyramidal PF;, and 
octahedral SF,—with the corresponding atomic 
hybrid orbitals of s (or p), sp, sp”, sp’, dsp”, and 
d’sp”. For instance, the average /B;AB; for AB, 
and AB; clusters was 173.2° + 6.4° and 119.8° + 
12.8°, respectively (figs. S27 and S28). The sim- 
ulation results for AB,, structures were in agree- 
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ment with experimental CM configurations. 
The AB, AB,, and ABs structures were ob- 
tained with 82.3, 86.4, and 75.6% yield, re- 
spectively, close to the corresponding values 
predicted by simulations (Fig. 2B). 

The generality of the approach to CMs was 
demonstrated by assembling NPs with various 
dimensions and compositions. Figure 2, C 
and D, shows AB, structures formed by bond- 
ing 36-nm-diameter Au NP-As with 26- or 
42-nm-diameter Au NP-Bs. The increase in 
NP-B diameter was balanced by reducing the 
M, of the PDMAEMA-r?-St) block to maintain 
a stoichiometric number of base groups on 
NP-Bs (figs. S29 to S31 and tables S10 and S11). 
We also formed compositionally heterogeneous 
AB, CMs from Fe30, NP acting as a central 
NP-A, which were bonded with two Au NP-Bs 
(Fig. 2E and table S2), and from Au NP-A posi- 
tioned between two Ag NP-Bs (Fig. 2F). 

To validate that the reaction stoichiometry 
governs the directional NP bonding, we ex- 
amined the variation of the CM structure with 
Za/z = Na/Npg, the ratio of the total number 
(Na) of acid groups on a single NP-A to the 
total number (Nz) of base groups on a single 
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Fig. 3. General principles of directional NP bonding. (A to D) SEM images of CMs showing the effect of oa (A), o (B), Dg (C), and n (D). (E) Variation in x 
in AB, structures with increasing Za/g, controlled by (V) o, of the ligands on NP-A, (A) a in the P(AA,-r-St_,)m block of ligands on NP-A, (a) Dg of NP-Bs, and (0) n 
of P(DMAEMAg-r-Sty-s), on NP-B. For each x, the average values of Zaye (x) were calculated with standard deviations. The solid line is provided to guide the eye. 


The results of simulation (7°) correspond to structures in Fig. 2A. 
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Fig. 4. Hierarchical self-assembly of amphiphilic CMs. (A and E) Schematics of 
self-assembly of amphiphilic AB CMs (A) and AB> (E) CMs. (B and F) SEM images 
of petal-like assemblies from amphiphilic AB CMs (B) and ribbons from 


amphiphilic AB2 CMs (F). The copolymers used for NP-A and NP-B are PStz¢7-b- 


P(AAp327T-Stogg)aig and PEO,s-b-P(DMAEMAgart-Stos9)463 (B), and PStz¢7-b- 
P(AAo32°f-Sto.6g) 318 and PEO,5-b-P(DMAEMAog 36-'-Sto.64)156 (F), respectively. 
(C and G) Simulated electric field intensity enhancement contours of an AB CM 
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(left) and the corresponding petal-like assemblies (right) (C), and an AB2 CM 
(left) and the corresponding ribbon-like assemblies (right) (G) on glass substrates. 
The electric field is polarized along the vertical direction. |E/Eo|, electric field intensity 
enhancement. (D and H) Normalized extinction of AB (D) and AB2 (H) CMs in 
tetrahydrofuran (dashed lines) and their petal-like (D) and ribbon-like (H) assemblies 
on glass substrates (solid lines) in experiment (red lines) and simulation (blue lines). 
Scale bars are 100 nm in (B) and (F) and 50 nm in (C) and (G). 
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NP-B. The value of Z,;z was changed by vary- 
ing the D of NP-As or NP-Bs, copolymer struc- 
ture, or the grafting density o of copolymers as 


Q:-m Oa Da 2 
Za/B = pei pectic 
B-n op \Dz 


where subscripts A and B correspond to the 
characteristics of NP-A and NP-B respec- 
tively, and o-m and B-n are the number of 
acid groups in the P(AA,-7-St,_,.), block and 
base groups in the PDMAEMA¢-r-St,_g),, block, 
respectively, in the copolymer ligands. The val- 
ue of Zaz is controlled by varying a, m, o,, and 
Dy for NP-As or B, 2, op, and Dz for NP-Bs. 

Based on the evaluation of the effect of o, 
n, oa, and Dz by changing one of these pa- 
rameters at a time, we propose the following 
stoichiometry-driven NP bonding principles: 

1) The grafting density o determines the total 
number of reactive groups per NP and hence 
the value of x in the AB, structure (fig. S32A). 
When NP-B characteristics are constant, the 
increase of o, results in a transition from AB, 
to AB,, and to ABg structures (Fig. 3A, fig. $33, 
and table $12). However, at o, > 0.30 chain/nm”, 
colloidal bonding is suppressed, owing to the 
limited interpenetration of the ligands on re- 
acting NPs. 

2) The value of x increases with increasing 
a of the P(AA,-7-Sty_,) block or decreasing B 
of the PDMAEMAg-r-St;.g)n block (fig. S32B). 
An increase in o results in a transition from 
AB to AB, and to ABs structure, because more 
NP-Bs are required to react with a larger num- 
ber of acid groups on the NP-A (Fig. 3B, fig. 
$34, and table S13). In our work, CMs form 
only in a specific range of a and B values, that 
is, 0.10 < o s 0.36 and 0.29 < B < 0.41. When a 
or B is smaller than the minimum values, the 
CMs form at low yield, owing to the insuf- 
ficient NP reactivity (fig. S35), whereas when 
o. or B exceeds the maximum values, the NPs 
are colloidally unstable (fig. S36). 

3) The variation in D and n (or m) cooper- 
atively influences NP bonding by controlling 
both Zag and the accessibility of reactive 
moieties on interacting NP pairs (fig. S32, C 
and D). The increase in Dz and n for NP-B 
leads to AB, structures with a reduced a, be- 
cause more base groups are available on NP-Bs 
for reaction (Fig. 3, C and D; figs. $37 and S38; 
and tables S14 and S15). 

Figure 3E shows the experimental and com- 
putational results on the dependence of AB, 
structures on the ratio Z,,g. The variation in 
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the discrete values of x in AB, structures fol- 
lowed a linear trend of v = Za/p = Na/Np (or 
Na = XNg), which determined the ratio between 
NP-B and NP-A “reactants” in the reaction. 

We engineered copolymer ligands to achieve 
programmable assembly of CMs in hierar- 
chical structures with tunable optical proper- 
ties (Fig. 4). In a proof-of-concept experiment, 
hydrophobic NP-As [generated by replacing 
the PEO block in the ligands with a polysty- 
rene (PSt) block] were bonded with hydrophilic 
NP-Bs tethered with PEO-b-P(DMAEMA-?-St) 
copolymers to form amphiphilic AB CMs (Fig. 
4A and fig. S39A). The CM solution in tetrahy- 
drofuran was placed on top of the saturated 
aqueous solution of NaCl. (The latter is immis- 
cible with tetrahydrofuran.) After tetrahydro- 
furan evaporation, hydrophobic interactions 
between NP-A subunits governed CM assem- 
bly into petal-like structures (Fig. 4B). The self- 
assembly resulted in a polarization-dependent 
enhancement of electric near-field and red 
shift of the plasmonic bands of the assem- 
blies, owing to plasmonic coupling between 
the NP-As in the petal-like structures (Fig. 4, 
Cand D, and fig. S40, A and B). Furthermore, 
we achieved hierarchical assembly of amphi- 
philic AB, structures with a hydrophobic 
center and two hydrophilic ends into linear 
ribbons, owing to the side-by-side association 
of the hydrophobic NP-As (Fig. 4E and fig. 
S39B). Compared with AB, CMs, additional 
near-field enhancement and red shift of plas- 
monic bands were caused by coupling between 
NPs along the longitudinal axis of ribbons 
(Fig. 4, F and G, and fig. S40, C and D). 

We have developed a stoichiometry medi- 
ated colloidal bonding strategy for high-yield 
fabrication of nanoscale CMs with well-defined 
structures. The experimental and computa- 
tional results reveal that the mechanism of 
colloidal bonding and the CM “formulas” are 
governed by the reaction between flexible 
polymer ligands on the NP surface, whereas 
the CM symmetry (bond angles) is controlled 
by electrostatic repulsion between the NPs. 
The design principles for CM generation show 
that the approach is applicable to NPs with 
different sizes and compositions. By tailoring 
the features of the polymer ligands, the interac- 
tions between CMs are engineered to guide 
their assembly at a higher hierarchical level. 
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ATMOSPHERIC AEROSOLS 


Multiphase buffer theory explains contrasts 
in atmospheric aerosol acidity 


Guangjie Zheng’, Hang Su**, Siwen Wang’, Meinrat 0. Andreae”*", Ulrich Péschl2, Yafang Cheng’* 


Aerosol acidity largely regulates the chemistry of atmospheric particles, and resolving the drivers 

of aerosol pH is key to understanding their environmental effects. We find that an individual buffering 
agent can adopt different buffer pH values in aerosols and that aerosol pH levels in populated 
continental regions are widely buffered by the conjugate acid-base pair NH4*/NH3 (ammonium/ 
ammonia). We propose a multiphase buffer theory to explain these large shifts of buffer pH, and we 
show that aerosol water content and mass concentration play a more important role in determining 
aerosol pH in ammonia-buffered regions than variations in particle chemical composition. Our results 
imply that aerosol pH and atmospheric multiphase chemistry are strongly affected by the pervasive 
human influence on ammonia emissions and the nitrogen cycle in the Anthropocene. 


erosol acidity has attracted increasing 
interest in atmospheric research because 
it influences the thermodynamics of gas- 
particle partitioning and the chemical 
kinetics of the formation and transfor- 
mation of air particulate matter (J-8). Under- 
standing the temporal and spatial variations 
of aerosol pH in the atmosphere is crucial 
for accurate predictions of the properties of 
atmospheric aerosols and their effects on health, 
ecosystems, and climate (9-12). In marine envi- 
ronments, the uptake of acidic gases like SOp, 
H,SO,, and HNO; may rapidly consume the 
alkalinity and reduce the pH of sea salt aerosols 
(13, 14). For continental air masses in the south- 
eastern United States, Weber et al. (15) have 
suggested that aerosol pH is buffered in the 
range of ~0 to 2 because of the interaction of 
aqueous (NH4)2SO4-NH4HSO, with gaseous 
NH3. However, their later studies have at- 
tributed the elevated pH levels in northern 
China (~3 to 6) (8, 16-19) mainly to changes in 
particle chemical compositions—i.e., a shift from 
sulfate- to nitrate-dominated aerosols (12, 19)— 
whereas Cheng et al. (8) have highlighted the 
role of ammonia and alkaline aerosol compo- 
nents from natural and anthropogenic emissions 
in understanding aerosol pH in this region. 
Despite these advances, it is still unclear 
how aerosol pH is buffered in other continen- 
tal regions, such as northern China, compared 
with the southeastern United States. To answer 
this question, we first performed numerical 
model calculations with the state-of-the-art 
thermodynamic model ISORROPIA (20) to 
examine the response of pH in aerosols upon 
the addition of sulfuric acid under different 
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conditions that are characteristic of the south- 
eastern United States (15, 21), the North China 
Plain (8, 22), northern India (23), and western 
Europe (24) [table S1 and (25)]. For reference, 
we also calculated the response of an aqueous 
solution of Na,SO,. As shown in fig. S1, the 
Na,SO, solution exhibits the expected steep 
decrease of pH upon acid addition. For aerosol 
systems, however, the pH does not show a 
substantial decrease until the added amount 
of acid (H* equivalent) reaches ~20% of the 
initial amount of anions in the aqueous par- 
ticles (molar ratio), which indicates an ad- 
ditional buffering effect. To further investigate 
this phenomenon, we focus on the scenarios 
for the southeastern United States (SE-US) 
and for the North China Plain (NCP), which 
have been intensively investigated and dis- 
cussed in earlier studies. As indicated in 
table S1, the SE-US scenario is characterized 
by relatively low aerosol concentration, low 
aerosol water content (AWC), and high tem- 
perature, as observed under clean-air summer 
conditions in the southeastern United States. 
By contrast, the NCP scenario is characterized 
by the high aerosol concentration, high AWC, 
and low temperature observed during extreme 
winter haze events in the Beijing region. 

In aqueous solutions, the pH of different 
buffer systems is usually determined by the pK, 
(where K, is the acid dissociation constant) 
of the buffering agents (26). Accordingly, the 
different pH buffer levels in fig. S1 would sug- 
gest different buffering agents corresponding 
to different particle chemical compositions. 
To identify the most relevant buffering agents, 
key controlling parameters, we introduce the 
concept of a multiphase buffering capacity 
that describes the resistance to pH changes 
upon input of acids or bases in an aerosol 
multiphase system in analogy to the traditional 
buffering capacity of bulk aqueous solutions. 
The buffering capacity B is defined as the ratio 
between the amount of acid or base added 
to the system (Naciq OF Npase, in moles per 
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kilogram) and the corresponding pH change 
in the aqueous phase of the system, or B = 
-AdNaciq/AaPH = dNpyase/dpH. The larger the 
buffering capacity B, the less the pH will change 
upon the addition of acids or bases. 

Figure 1A shows the buffering capacities for 
the SE-US and NCP aerosol scenarios and for 
bulk aqueous solutions of the individual buf- 
fering agents (i.e., conjugate acid-base pairs 
NH,"*/NH3, HSO, /SO,” , and HNO;/NO;_) 
as derived from numerical simulations of the 
gas-liquid and acid-base equilibria [see mate- 
rials and methods, section M1; results of the 
northern India and western Europe scenarios 
are in fig. $2; and results of organic buffers are 
in the supplementary text, section S7 (25)]. In 
both aerosol scenarios, the largest buffering 
capacity is obtained for the acid-base pair 
NH,*/NH; followed by HSO,-/SO,”” and HNO;/ 
NO; . The peak buffer pH value (defined as the 
pH corresponding to the highest local maxi- 
mum of 8) for the SE-US scenario is ~0.7, and 
the peak buffer pH value for the NCP scenario 
is ~4.5. Thus, the buffer pH ranges (i.e., peak 
buffer pH + 1) (26, 27) closely match the aerosol 
PH ranges previously reported for the southeast- 
ern United States and for Beijing, respectively. 
This indicates that the conjugate acid-base 
pair NH,"/NHs is the main buffering agent in 
both the SE-US and NCP aerosol scenarios. 

This finding raises the question of how the 
same buffering agent can stabilize the aerosol 
pH at very different levels. As shown in Fig. 1A, 
in bulk aqueous solution, the peak buffer pH 
of NH,*/NHs is ~9.2, but in the NCP and SE-US 
aerosol scenarios, it shifts to much lower values 
of ~4.5 and ~0.7, respectively. By contrast, the 
peak buffer pH of the conjugate acid-base pair 
HNO;/NO3;_ shifts in the opposite direction 
from ~-1.5 in the bulk aqueous solution to 
higher values of ~0.2 and ~3.8 in the NCP and 
SE-US scenarios, respectively. The conjugate 
acid-base pair HSO,; /SO,” , on the other hand, 
exhibits similar peak buffer pH values of ~2 
in all three scenarios (Fig. 1A). These differ- 
ences and shifts of peak buffer pH reflect spe- 
cial features of the aerosol multiphase buffer 
system that go beyond the traditional buffer 
theory for bulk aqueous solutions, and they 
highlight the need for a mechanistic under- 
standing of the multiphase buffering mech- 
anism in atmospheric aerosols. 

To elucidate the underlying mechanisms and 
key parameters, we have developed a multi- 
phase buffer theory and derived an analytical 
expression for the buffering capacity of a buf- 
fering agent X (conjugate acid-base pair) in an 
aerosol multiphase buffer system as detailed 
in section S1 (25) 


K,,°|H ] J * 
aren Tay [Xileot \ (1) 
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Here, Ky is the water dissociation constant, 
[Xi]tot* represents the total equivalent mol- 
ality of the buffering agent X;, including the 
gas phase and aqueous phase of both conju- 
gate acid-base species—e.g., the sum of NH,(g), 
NH,(aq), and NH,"(aq) for the buffering agent 
NH,"/NH3. K,,** is an effective acid dissociation 
constant of the buffering agent X; and can be 
expressed by 


[H* (aq)] ([BOH(aq)] + [BOH(g)]) 
[B*(aq)] 


Kazon = 


(2A) 


Pwr 
= a 
nee ( TER we 


. _ [H*(aq)][A“(aq)] 


Kana’ = THA(aq)]+[HA(8)] 
= Kaua/ eer aie 2B) 


for volatile base BOH and volatile acid HA that 
dissociate in the form 


B* (aq) + HxO =H" (aq) + BOH(aq) (2C) 


HA(aq) = H* (aq) + A’ (aq) (2D) 


As shown in Eq. 2, the effective dissociation 
constant K,,;* depends on the classical disso- 
ciation constant K,; as well as on the Henry's 
law coefficient H; (gas-particle partitioning 
constant) (in moles per liter per atmosphere) 
and on the AWC (in micrograms per cubic 
meter)—i.e., the amount of liquid water in 
the aerosol multiphase system. Here, py is 
the liquid water density (~10” ng m™), R is the 
gas constant (8.205 x10-° atm L mol? K”, and 
T is the absolute temperature (in kelvin). Note 
that gas concentrations in square brackets are 
expressed in units of equivalent molality (in 
moles per kilogram of water) [see section M1 
(25)]. The expression of Eqs. 1 and 2 in the 
other unit system can be found in section S2. 

By solving Eq. 1, we can find a local maxi- 
mum of £ at pH = pK,,*; i.e., the peak buffer 
pH of the agent X; is determined by K,,;*. 
Therefore, a single buffering agent can have 
its peak buffering capacity at very different 
PH values in an aerosol multiphase buffer sys- 
tem. According to Eq. 2A, for the buffering 
agent NH,*/NHs (volatile base), increasing 
AWC results in a reduced K,* and increased 
pXK,*. Thus, the traditional alkaline buffering 
agent NH,*/NH; effectively becomes an acidic 
buffering agent (pK,* < 7) in multiphase sys- 
tems (Fig. 1A). For volatile acid buffering agents 
(HNO;/NOs3, ), the AWC has the opposite effect 
on pK,* (Fig. 1A). Moreover, the shift of pK,* 
upon changes to the AWC is inversely pro- 
portional to the partitioning coefficient H;. 
Thus, the volatile buffering agents HNO;/NO; 
and NH,*/NH; (low Hj) exhibit large shifts, 
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whereas the peak buffer pH of HSO,7/SO,?~ 
hardly changes (high H;) (table S2). As shown 
in Fig. 1B, the effective dissociation constant 
K,* converges with the standard acid-base 
dissociation constant of the buffering agent 
for high values of AWC (Eq. 2), and the multi- 
phase buffer theory converges with the con- 
ventional buffer theory in solution chemistry 
[see sections S1 to S4 (25)]. Note that ac- 
tivity coefficients must be considered in the 
calculation of nonideal systems [see section 
$3 (25)]. 

Figure 2 further explains the thermodynam- 
ics that causes the shift of buffer pH in a multi- 
phase system. The conventional bulk buffer 
solutions (e.g., NH,*/NH3), assuming no ex- 
change with a gas phase, achieve their largest 
resistance to pH change when the molality of 
NH, (aq) is equal to that of NH;(aq) (Eq. 3A) (26) 


— 1 [NH3(aq)] 
pH = pKa yu, + logio INH, + (aq)] (3A) 
where 
Kam, < Kt = H*@aIINHsC0)] (gp) 


[NH4*(aq)] 


and Kyu, is the base dissociation constant of 
NH; (table $2). 


Kynu, 


A 


+ 2 De 
NH, / NH, HSO; / SO? 


Water self-buffered 


Buffering Capacity G (mol kg’') 


For gas-liquid multiphase systems, this equi- 
librium is extended to the gas phase, and pH 
becomes a function of K,* and the ratio of total 
NH; in both gas and aqueous phase to NH," in 
the aqueous phase [Eq. 4A; section S1 (25)]. 
Accordingly, the largest resistance to pH change 
under given [NHa]tor* (INH3lio* =[NH3(aq)] + 
[NH.(g)] + [NH,*(aq)]) is achieved when the 
molality of NH,*(aq) is equal to the sum of 
NH,(aq) and NH;(g). Note that Eq. 3 still holds 
for the aqueous phase in the multiphase system 


[NH3(aq)]+[NH3(g)] 


PH = pKayu, + logio 


[NH *(aq)] 
(4A) 
where 

__ [A*(aq)] ((NHs(aq)]+INH5(8)]) 

Kanu, = 

os INH. *(aq)] 
Pw 

= Kanu, (1+_ Saas) (4B) 


Figure 2 shows the conditions where the 
peak buffer pH values are achieved in different 
systems—i.e., the same height of NH,” and 
NHs in each panel represents their same molar 


-- — — - overall 


numbers in each system. Compared with bulk 


Fig. 1. Buffering capacity 
for aerosol multiphase 
systems compared with 
bulk aqueous solution. 
(A) Buffer capacities (8) for 
the SE-US and NCP aerosol 
scenarios and for bulk 
aqueous solution of indi- 
vidual buffering agents 
(solid lines). The overall 
buffering capacity (black 
dashed lines) is obtained 
by adding the individual 
buffer agent contributions 
to the solvent background 
of water [fig. S3 and sec- 
tion S5 (25)]. The 
composition of the bulk 
solution is assumed to have 
the same aqueous phase 
molality as in the SE-US 
scenario. (B) Dependence 
of the peak buffer capacity 
(pK,*) of NH4*/NH3 on 
aerosol water content 
(AWC) and temperature. 


HNO, / NO; 


Bulk aqueous 
solution 


14 16 


Bulk aqueous 
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solution, a fraction of NHs partitions to the gas 
phase in the NCP scenario, which results in less 
NH;(aq) and a reduced [NH,(aq)]/[NH,"(aq)] 
ratio, which leads to a lower pH in the aqueous 
phase according to Eq. 3. Further reduction of 
AWC in the SE-US scenario will push more NH; to 
the gas phase and further reduce the aerosol pH. 

Figure 3 compares the contribution of in- 
dividual factors in explaining the difference in 
aerosol pH between the NCP (~5.4) and SE-US 
(~0.7) scenarios in fig. S1 [sections M2, S3, S5, 
and S6 (25)]. The AWC appears to be the most 
important factor, contributing 2.2 units of pH 
change (ApH), followed by T, which contributes 
another 1.6 units of ApH. Although earlier 
studies have hypothesized that the marked 
observed pH difference is caused by a transi- 
tion in particle chemical composition from a 
sulfate- to a nitrate-dominated regime (72, 19), 
our results show that the change of chemical 
composition only plays a minor role. Differ- 
ent AWCs [mainly caused by different aerosol 
concentrations at a given relative humidity 
(RH)] and 7 values can already explain a shift 
of ~4 units of aerosol pH. The difference in 
chemical composition contributes ~0.7 pH units 
in total, with ~0.5 from the difference in total 
NHz fraction and ~0.1 and ~0.1 from the dif- 
ference in the fraction of NO; and nonvolatile 
cations (NVCs), respectively. Overall, different 
AWCs and T values are the main drivers of the 
pH difference between the NCP and SE-US 
scenarios, whereas the higher fraction of total 
NH3, NVCs, and NO; in the NCP further en- 
larges the difference. 

In Fig. 4, we performed global model sim- 
ulations to identify the buffered regions and 
used both simulation and observational data to 
further compare the roles of AWC and chemical 
compositions in determining the variabilities 
of aerosol pH [sections M3, S83, and S6 and 
table S3 (25)]. As shown in Fig. 4A, ~40% of 
continental surface areas (not including Ant- 
arctica) and 71% of urban populated areas 
were buffered by the NH,*/NH3 agent with 
aerosol pH values mostly within the buffer range 
[pKann3* + 1 (26, 27)]. In these regions, without 
knowing the temporal and spatial variability 
of particle chemical composition, variations 
in AWC alone explain almost 70% (R? = 0.66, 
simulation; where R? is the coefficient of 
determination) and 80% (R” = 0.77, observa- 
tion) of the variation of aerosol pH, assuming 
an NH,*/NH;-buffered system (Fig. 4B). On 
the other hand, when a constant AWC is as- 
sumed, distinct variations of aerosol acidity with 
particle chemical composition were observed, 
but they only played a secondary role (R? = 
0.22 and 0.26 for simulation and observation, 
respectively; Fig. 4C). We also found a reverse 
role for AWC and composition in regions that 
are not buffered by NH,*/NH3, where chem- 
ical composition differences alone explain >90% 
of the variations of aerosol pH (fig. $5). Overall, 
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the buffering effect of ammonia suppresses the 
influence of compositional differences, making 
aerosol water content the primary determinant 
of aerosol pH. 

The multiphase buffering of aerosols and 
the key role of AWC in determining the peak 
buffer pH (pK,*) have implications for atmo- 


ation (25)]. 


Drivers of historical trends in aerosol pH can 
now be better understood and quantified [sec- 
tion S8 (25)]. In populated continental regions 
with high anthropogenic emissions and atmo- 
spheric concentrations of ammonia (28), aero- 
sol pH is likely controlled by the buffering pair 
NH,"*/NH; and can thus be approximated on 
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Fig. 4. Drivers of aerosol pH diversity in ammonia-buffered regions. (A) Global distribution of continental 
surface regions buffered by NH,*/NH3. The color coding shows the maximum buffer capacity by NH4*/NH3 

(in moles per cubic meter of air). (B) Correlation of aerosol pH modeled by ISORROPIA with the predicted pH 
derived using constant buffering agent and multiphase buffer theory. Sim., simulation; Obs., observation. 

(C) Correlation of aerosol pH modeled by ISORROPIA with the predicted pH by ISORROPIA using constant 
AWC but variable compositions. Black circles and gray dots represent analysis based on model simulations and 
observations, respectively (see section M3 and table S3). Note that the observations are based on individual case 
studies and thus show a wider range of aerosol pH than the annual average simulation results. 


water content (Eq. 4). This opens up possibili- 
ties to reconstruct long-term trends and large- 
scale spatial distributions of aerosol pH. Other 
buffering agents, such as HSO,; /SO,7, HCI/CT, 
or HCO; /CO,”, are likely to control aerosol 
PH over the oceans (13, 14, 29, 30), but the buf- 
fering effects of NH,*/NH3; may extend over 
ammonia-rich coastal and downwind regions. 
Thus, the notable human influence on ammo- 
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nia emissions and the global nitrogen cycle in 
the Anthropocene substantially affects aerosol 
PH and atmospheric multiphase chemistry on 
a global scale. 
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Long-term forest degradation surpasses 
deforestation in the Brazilian Amazon 


Eraldo Aparecido Trondoli Matricardi’*, David Lewis Skole”*, Olivia Bueno Costa’, 
Marcos Antonio Pedlowski*, Jay Howard Samek’, Eder Pereira Miguel! 


Although deforestation rates in the Brazilian Amazon are well known, the extent of the area affected 
by forest degradation is a notable data gap, with implications for conservation biology, carbon cycle 
science, and international policy. We generated a long-term spatially quantified assessment of forest 
degradation for the entire Brazilian Amazon from 1992 to 2014. We measured and mapped the full 
range of activities that degrade forests and evaluated the relationship with deforestation. From 1992 to 
2014, the total area of degraded forest was 337,427 square kilometers (km?), compared with 308,311 
km? that were deforested. Forest degradation is a separate and increasing form of forest disturbance, 
and the area affected is now greater than that due to deforestation. 


everal international initiatives—such as 

the Aichi Biodiversity Targets in the Con- 

vention on Biological Diversity, REDD+ 

(Reducing Emissions from Deforestation 

and Forest Degradation) in the United 
Nations Convention on Climate Change, and 
the Bonn Challenge, which focuses on resto- 
ration of degraded forests—require information 
on the rate and extent of forest degradation 
C/, 2). Yet, degradation of forest ecosystems is 
perhaps one of the more challenging types of 
disturbances to measure and monitor. The 
rate and extent of forest degradation in the 
Brazilian Amazon (BA) is a key component of 
a national strategy for climate change mitiga- 
tion and adaptation (3). One challenge with 
monitoring degradation is that it occurs within 
forests, leaving a standing stock of biomass 
and canopy cover that can make detection dif- 
ficult. Forest degradation in the BA and else- 
where is caused by an array of agents or drivers, 
with greater or lesser degrees of poorly quan- 
tified interaction between drivers or with de- 
forestation activities. Unlike deforestation, 
degradation events may reoccur with varying 
frequencies at the same location, sometimes 
several years later, and different types can spa- 
tially overlap. 

Fundamentally, forest degradation has been 
widely recognized as an important form of 
disturbance (4-6). However, previous efforts 
to measure and map degradation in the BA 
have focused on individual agents, such as 
logging (7), burned areas or active fires (8, 9), 
or fragmentation (10). Others have assessed 
degradation only indirectly; for instance, 
Baccini et al. (4) estimated carbon emissions 
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from degradation as the difference between 
overall canopy damage and that attributed 
directly to deforestation (7). Other analyses 
have focused on specific sites or subregions (5) 
or on sampling in forest strata spatially asso- 
ciated with concentrated deforestation (72). In 
this analysis, our aim was to map the current 
BA-wide extent of forest ecosystems that have 
been degraded since 1992 and compare it with 
the area deforested. The analysis presented is 
a long-term (~23 years) BA-wide high-resolu- 
tion spatial analysis, intended to reveal how 
degradation has changed in magnitude and 
geographic distribution and to measure its 
permanence in the landscape. 

Forest disturbance by human activities in 
the BA occurs across a gradient of severity, from 
complete forest conversion to various inten- 
sities of degradation within forests. Deforesta- 
tion is the complete conversion of forests to 
another land use type, usually pasture in the 
BA. Forest degradation occurs within forests 
and is characterized by a loss of biomass within 
an intact canopy (6). Forest degradation is 
also a secondary result of deforestation, which 
produces edge effects and isolated forest 
patches in fragmented forests (73). These dis- 
turbances have important large-scale environ- 
mental consequences, including the release of 
greenhouse gases (J4-16), alteration of water 
and energy balances (17, 18), loss of biodiversity 
(19, 20), and increasing incidence of infectious 
disease (27). In the BA, deforestation reached a 
peak rate in 2003 to 2004: at ~29,000 km? year’, 
bringing international attention and then 
national policies that reduced these rates 
significantly (22). By 2014, deforestation rates 
declined below ~6000 km” year! based on sat- 
ellite data analysis by the Amazon Deforestation 
Monitoring Project (PRODES) operated by the 
Brazilian Space Agency, Instituto Nacional de 
Pesquisas Espaciais (INPE) (23). 

To map and analyze the distribution and 
extent of degraded forest in the BA forest land- 
scape, we needed to use medium-resolution 
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remote-sensing (30 m) data. An existing model 
(24-28) uses a stepwise semiautomated analysis 
of all Landsat images for the forest area of the 
Legal Amazon of Brazil (29). We mapped six 
types of forest disturbance: (i) deforestation, (ii) 
selective logging, (iii) understory fires in intact 
forests, (iv) fires on logged sites, (v) forest edge 
effects adjacent to deforested areas, and (vi) 
isolated forest fragments created by deforest- 
ation. The method (29) uses a visual digital 
object analysis framework, digital spectral 
analysis (canopy texture from spectral radiance 
variation and canopy density from spectral 
mixture analysis), and then iterative calibra- 
tion using field data (26—28). A dataset was 
constructed from more than 1200 Landsat 
satellite digital images covering the entire BA 
forest area, which were then digitally analyzed 
for seven observation years (OYs): 1992, 1996, 
1999, 2003, 2006, 2010, and 2014. The six digi- 
tal map layers were stacked at each OY to de- 
lineate pixel overlay unions of new occurrences, 
persistent occurrences, overlapping occurrences, 
and sequential occurrences (figs. S1 to $3). 

Digital spatial data layers for deforestation 
were obtained from INPE’s long-term PRODES 
dataset (23) for overlay with degradation layers 
in OYs after 2000 through 2018. For deforest- 
ation mapping before 2000, we processed 
data as reported in (24) and (26). The INPE 
deforestation dataset is the official national 
reporting source and provides a logical bench- 
mark for comparing our estimates and maps 
of degradation. We developed and field-validated 
a periodic measurement model that produces 
accurate estimates of logging and burned area 
every 3 to 4 years for moderate- and high- 
intensity logging of removals of >10 m? ha * 
(24, 27, 29, 30). Forest edges are mapped only 
in undisturbed forest adjacent to deforested 
areas to 120 m. Edge areas adjacent to logging 
or burned scars are not counted in the edge 
counts. Isolated forests created by deforest- 
ation are mapped for all undisturbed forest 
patches between 1 and 100 km’ in size. To re- 
port degradation of undisturbed forest, each 
pixel was assigned a single identifier according 
to a hierarchical rule (table S1). This excluded 
double counting of degradation occurring mul- 
tiple times on a single pixel of land, but the 
combinations and recurrences are retained 
in the database. 

Average annual rates of new degradation 
are presented in Fig. 1A, compared with the 
rate of deforestation. These were derived from 
mapping the remote sensing-detected degra- 
dation at each OY and new degradation be- 
tween each OY (tables S2 and $3). As expected, 
overall degradation declined with deforesta- 
tion rates and a concomitant decline in pro- 
duction of edge and isolated forest fragments, 
which make up a large fraction of all degra- 
dation during the early period. Declining de- 
forestation resulting from new policy measures 
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Fig. 1. Quantitative results of forest degradation in the BA, 1992 to 2014. (A) Annual average rates of forest degradation; total and contributions 
from each type are shown as segmented bars, with the rates of logging and understory burned areas shown by the dashed line and rates of 
deforestation from INPE by the solid line. DD, dependent degradation types; ID, independent degradation types. (B) The cumulative area 

impact of forest degradation on the forest biome landscape in the BA by each type of degradation and comparison with the total area deforested 


during the analysis period, 1992 to 2014. 


reduced new edges and isolated forest after 
2006 to 2010, but edges began to decline earlier 
while deforestation remained high, most like- 
ly as a result of consolidation and fill-in of 
spatially dense and continuous deforested 
areas. Overall, the annual rate of all types of 
degradation declined over the time series, 
from a peak of 44,075 to 14,625 km? year’. 
Nonetheless, forest degradation rates ex- 
ceeded deforestation by almost threefold 
in 2014. 

Whereas rates of fragmentation decreased, 
rates of selective logging and understory burn- 
ing, two types of heavy-impact forest degrada- 
tion, slightly increased or remained stable over 
time. The amount of new selective logging 
created between 1992 and 1996 increased from 
8,498 to 22,952 km? between 2010 and 2014, 
an increase of 270% (table S3). When combined 
with new burned forest, the area increases 
from 14,866 km? in 1992 to 1996 to 26,327 km” 
in 2010 to 2014, an increase of 177%. By 2006 
to 2010, the average annual rates of forest 
degradation by logging and burning were 
approximately equal to deforestation rates, 
and by 2014 degradation exceeded deforesta- 
tion (Fig. 1A). 

To compare the amount of degraded forest 
in the BA today with the deforested area, all 
newly created degradation pixels were tracked 
and accumulated through time. There was no 
double counting of more than one type of 
degradation occurring at the same place, and 
pixels that were deforested by 2014 were 
removed (Fig. 1B). The total degraded forest 
created during the period of our analysis and 
that remain present in the current landscape 
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is 337,427 km”, compared with 308,311 km” of 
deforested land. This estimate does not in- 
clude degradation that occurred before 1992. 
Much of the degradation was from edges and 
isolated forest fragments, but the total area 
degraded by logging and burning alone over 
this period was equivalent to almost half (43%) 
of the area deforested over this period. In most 
locations across the BA, there is more degraded 
forest than deforested land when considering 
only what occurred during the time frame of 
analysis (fig. S4). During this long-term period 
of observation, 40% of all degraded forest can 
be attributed to intensive logging and under- 
story fires and 60% is due to edges and iso- 
lated fragments of forest, which represents a 
notable increase in the logged and burned frac- 
tion later in the record. 

BA-wide estimates from the analysis were 
constructed at the original 30-m resolution 
and then aggregated in 200-km” grid cells for 
mapping and graphical display (Fig. 2). These 
maps show the cumulative impact of all degra- 
dation types. The map shows the status of for- 
est ecosystems in the BA, including the density 
and extent of degradation. The mapping is 
presented for the entire period of analysis 
and separately for the period before the down- 
turn in deforestation rates and the period 
after. Generally, degradation is more spatially 
dispersed across the landscape than defores- 
tation, which is concentrated in the often-cited 
“are of deforestation” along the eastern and 
southern forest interface with the Cerrado 
Biome in Brazil, which comprises a region 
with vegetation types similar to African savan- 
nah. There are concentrated zones of high 
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degradation close to older areas of defores- 
tation, but degradation is also emerging in 
the western BA, particularly by new logging 
(Fig. 3). The spatial organization of logging 
suggests that it is increasingly decoupled from 
understory burning (figs. S5 and S6), where 
logging is relocating more distantly from the 
so-called arc of deforestation, whereas burned 
areas remain more restricted closer to the older 
areas of deforestation (figs. S7 and S8). Further- 
more, we found very little overlap of burned 
areas on logged areas, especially in the short 
term (4 to 8 years). 

The dominant local driver of degradation 
was mapped for each 200-km? grid cell (Fig. 4). 
Degradation related to deforestation, such as 
edges and isolated fragments, is important 
in the BA-wide landscape, not only in the older 
areas but also along the new frontiers. Logged 
areas are dominant in some specific areas 
where degradation is uniformly very high, and 
they are expanding to the west along a new 
frontier (Fig. 4 and figs. S7 to S9), whereas 
nodes of burned dominance are very spatially 
localized. Edge and isolated forest fragments 
are spatially and geographically extensive. 
Edges tend to be the prevalent and extensive 
type in the earliest years and then in the new 
frontier of western BA (Fig. 4, B and C), whereas 
isolated forest dominates some old areas of 
deforestation and degradation during the 
later years (Fig. 4C). In most places, all types 
of degradation are occurring in the landscape, 
although we found little evidence of significant 
spatial overlap and co-occurrence, even when 
considering degradation co-occurrences widely 
separated in time. 
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Fig. 2. Spatial mapping of forest degradation in BA forest ecosystems. The maps present separately the inventory of all forest areas degraded by all degradation (top) 
not deforested, compared with the total deforested land (bottom), between 1992 and 2014. Maps on the left show all degradation or deforestation during the study 
period. Center maps show degraded forest or deforested areas after the downturn in deforestation rates. Maps on the right show areas degraded or deforested before the 
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Fig. 3. Detected areas of forest degradation by logging and understory burning at each OY. Areas are mapped as a fraction of a 200-km? grid cell for the 
entire BA. The geography of logging is shown to have expanded from the older deforestation zone, often cited as the arc of deforestation, particularly after 2003. (A) Arrows 
show the general direction of the expanding logging frontier. (B) New distant forest degradation in Roraima. (C and D) New forest degradation from logging in 

the western Amazon. (E) The prominent forest degradation front in western Para. 
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Results from an analysis of regional trends 
were surprising (figs. S7 to S9). New logging 
areas are demonstrably expanding beyond the 
older arc of deforestation into a new western 
frontier (Fig. 3 and fig. S7), but spatiotemporal 
trends for other types are less clear. Under- 
story burned areas remain predominant in 
the specific areas of concentrated deforest- 
ation, with temporal trends somewhat inter- 
mediary or stable, having increased in the 
early period of record and now declining with 
deforestation rates. Interestingly, creation of 
new edges and new isolated forest fragments 
has generally declined over the entire record, 
particularly as deforestation rates have de- 
clined, although their coverage area remains 
high in overall magnitude. In fact, areas with 
highest overall coverage and density are also 
experiencing declining rates, regardless of 
type. The new frontiers with high and increas- 
ing rates are still quantitatively low in mag- 
nitude. The highest densities of degradation 
are along the long-standing deforestation fron- 
tier but present declining trajectories, whereas 
the emerging frontiers with lower densities 
have increasing trajectories; these new and 
expanding regions will likely be dominant 
in the future. 

For degradation to be an important form of 
forest disturbance in the BA, it must persist 
in the landscape and not be immediately con- 
verted by deforestation. We spatially tracked 
at the 30-m pixel scale the survivorship of 
each cohort of new degraded forest through 
the time series (table S7 and fig. S10). Sur- 
vivorship is measured as the area and per- 
centage of a cohort of degraded forests that 
persists without being deforested for a given 
length of time. Logged areas persisted the 
longest, as more than half (57%) of the area 
survived at least 18 years, from 1996 to 2014 
(46% from 1996 survived to 2018). Fully one- 
third of logged areas in 1992 were still present 
in the 2018 landscape, some of which had 
been relogged. The other types of degradation 
had much lower 18-year survivorship by 2014, 
ranging from 28 to 31%. Interestingly, as much 
as one-third of logged areas from 1992 were 
still present in 2018, and one-fifth to one- 
fourth of the other types of degradation in 
1992 were still present in 2018. Through the 
time series, survivorship was generally con- 
sistent but slightly increased after 2003, when 
deforestation rates declined. Some researchers 
have reported very low survivorship of logged 
areas for short periods of up to 4 years (3D), 
but we found high short-term survivorship 
for all types of degradation in general but es- 
pecially for logged areas, and these ranged 
from 82 to 93%. Burned areas have consider- 
ably lower short-term survivorship, ranging 
from 50% in the earlier period to 86% later in 
the time series, as deforestation rates declined. 
Edge and isolated forest short-term survivor- 


Matricardi et al., Science 369, 1378-1382 (2020) 


ship ranged from ~50% in the early period to 
~80% later, but also increased as deforestation 
rates declined. Although deforestation policy 
did not influence logging or lead to a decline in 
burned area degradation, it did relieve conver- 
sion pressures so that these logged and burned 
areas now persist longer. Reduced conversion 
pressure has extended the persistence of edges 
and isolated forest, which exacerbates tree mor- 
tality and other ecological effects (fig. S10). 
A large spatial overlay analysis to under- 
stand the co-occurrence of the different types 
of degradation follows naturally from the 
persistence analysis. We examined overlays 


25% 100% 
Logged 

(EE Burned 

ME Forest Edge 


[SE Forest Fragment 


of all degraded forest that were not defor- 
ested through 2014. The results were some- 
what unexpected, in that spatial co-occurrence 
of different degradation factors is very low. 
Throughout the BA, it is common to have 
all four types of degradation occur, but there 
is no evidence that they overlap in any signif- 
icant way, a finding that has implications for 
degradation intensity and our understanding 
of the interaction between drivers. Of all de- 
graded forest in the current landscape, 90% 
has been degraded by only one factor (table S9). 

This analysis considered the density, domi- 
nance, direction, and duration of five types 


Fig. 4. Maps of dominant drivers of degradation. Four types of degraded forest are shown: logged, 
understory burning, edges, and isolated forest fragments. The quantitatively most abundant type in each 
200-km* grid cell is the dominant driver at that local level. The color represents the most dominant 

type, whereas the tonal gradient indicates how dominant it is compared with other types, as a percentage 
of all types present. If all four types existed in approximately equal magnitude, the tone would be close 

to 25%, whereas the color tone would be darker and closer to 100% if there was only one type present. 
(A) The overall status of dominant types cumulatively through the entire time period, 1992 to 2014. (B) The 
dominant type at the start of the period of analysis, observed in 1992. (C) The dominant type at the end 


of the period of analysis, observed in 2014. 
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of forest degradation over almost three dec- 
ades. Edge effects and isolated forest frag- 
ments have been substantial contributors to 
the current degraded state of forest over the 
record, but the fragmentation contribution 
is declining while overt degradation from 
logging, and to a lesser degree fire, is becom- 
ing more prominent. We observed that there 
has been a transition from a deforestation- 
mediated fragmentation regime to one with 
an elevated importance of new logging and 
fire, which is geographically shifting to a new 
western “degradation frontier.” We can artic- 
ulate a simple framework for understanding 
these dynamics by considering two broad cat- 
egories of forest degradation types: (i) those 
that are dependent on, or coupled to, defor- 
estation, such as the fragmentation effects 
of edges and isolated fragments (DD); and 
(ii) those that are more independent of, or 
decoupled from, deforestation, such as log- 
ging and to a lesser degree understory fire 
(ID). 

National policies in Brazil have been es- 
tablished in a command-and-control fashion 
to reduce the rate of deforestation, and they 
have been effective. In turn, such deforesta- 
tion policies have influenced rates of DD forest 
degradation. However, these policies have had 
minimal effect on curbing ID degradation and 
have led to more persistent and long-lasting 
ID degradation in the landscape. Furthermore, 
annual rates of ID degradation now exceed 
deforestation rates, while being geographi- 
cally dispersed to new frontiers not associated 
with the historical deforestation frontier along 
the so-called arc of deforestation. With either 
the current policy situation or a return to lais- 
sez faire policies that ignore degradation gen- 
erally and ID degradation specifically, the rate 
and extent of forest degradation will likely 
increase in the future in response to market 
forces and the establishment of a separate 
logging sector infrastructure for extraction, 
processing, and transport. Selective logging has 
always been one of the first entryways into un- 
disturbed forests, as it occurs within close prox- 
imity of existing settlement and clearing. Now, 
logging is demonstrating the potential to leap 
further distances into remote areas. 

Several of our analytical assumptions and 
methodological features suggest that our esti- 
mates are conservative. Our buffer distance for 
edges is 120 m, and we did not estimate edges 
around logged and burned areas. Our logging 
detection does not include very-low-intensity 
logging below 10 m? ha‘, so it may omit 
some cases of reduced-impact logging. We also 
did not include highly selective individual 
tree logging, which occurs in the process of 
deforestation or tree removals by individual 
farmers on their homesteads, or indigenous 
logging. The periodic use of OYs may miss 
some low-intensity logging or small burning 
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events. Inclusion of these factors would only 
increase the estimate of how much degrada- 
tion exists in the landscape today. 

The overall conclusion from this work is that 
forest degradation is a significant form of land- 
scape and ecosystem disturbance. Degradation 
in the BA is a persistent form of disturbance, 
not simply one that is eventually replaced by 
deforestation. Focusing attention on deforest- 
ation alone ignores an additional area of forest 
degraded by selective logging, understory fire, 
edge effects, and isolation of fragments that is 
equal in areal extent to cleared forest. 

Improved long-term spatial data on forest 
degradation are sought by most multilateral 
environmental agreements. Our analysis pro- 
vides a cogent example of monitoring data 
needed to estimate species loss from forest 
fragmentation and degradation, which is a 
key element of Target 5 of the United Nations 
Convention on Biological Diversity. Our results 
align with long-term ground-based studies of 
forest fragmentation in conservation biology 
(32, 33) and contribute to a better understand- 
ing of species biodiversity loss (34, 35). Our 
measurements reemphasize the importance 
of technical consideration of forest degrada- 
tion in the international dialog on REDD+, for 
which most monitoring has been focused on 
deforestation. 
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Much of our understanding of Earth’s past climate comes from the measurement of oxygen and 
carbon isotope variations in deep-sea benthic foraminifera. Yet, long intervals in existing records lack 
the temporal resolution and age control needed to thoroughly categorize climate states of the Cenozoic 
era and to study their dynamics. Here, we present a new, highly resolved, astronomically dated, 
continuous composite of benthic foraminifer isotope records developed in our laboratories. Four 
climate states—Hothouse, Warmhouse, Coolhouse, Icehouse—are identified on the basis of their 
distinctive response to astronomical forcing depending on greenhouse gas concentrations and polar 
ice sheet volume. Statistical analysis of the nonlinear behavior encoded in our record reveals the key 
role that polar ice volume plays in the predictability of Cenozoic climate dynamics. 


lobal changes in Earth’s climate during 

the Cenozoic era, the last 66 million years, 

have long been inferred from stable- 

isotope data in carbonate shells of ben- 

thic foraminifera, which are single-celled 
amoeboid organisms that live on the seafloor. 
Stable carbon and oxygen isotope records from 
deep-sea benthic foraminifera are a proven, in- 
valuable archive of long-term changes in Earth’s 
carbon cycle, deep-sea temperature, and sea- 
water composition driven by changes in ice 
volume (J, 2). In 1975, Shackleton and Kennett 
(3) produced one of the first deep-sea benthic 
foraminifer stable isotope records of the Ceno- 
zoic. Despite being of low temporal resolution, 
it revealed that Earth’s climate had transitioned 
from a warm state 60 to 40 million years ago 
(Ma) to a cool state 10 to 5 Ma. Over the last 
45 years, many deep-sea benthic foraminifer 
stable-isotope records of variable length and 
quality have been developed, resulting in a more 
detailed record of Cenozoic climate change. 
Compilations of these deep-sea isotope records 
provide a compelling chronicle of past trends, 
cyclic variations, and transient events in the 
climate system from the Late Cretaceous to 
today (1, 4-10). However, even the most recent 
benthic isotope compilations cannot accurate- 
ly document the full range and detailed char- 
acteristics of Cenozoic climate variability on 


time scales of 10 thousand to 1 million years. 
Age models and temporal resolution of Ceno- 
zoic benthic isotope compilations are too 
coarse and/or include gaps, particularly before 
34 Ma. These weaknesses hamper progress in 
determining the dynamics of the Cenozoic 
climate system (4, 9, 11), for example, because 
they prohibit application of advanced tech- 
niques of nonlinear time series analysis at the 
required (astronomical) time scales. The lack 
of highly resolved, continuous, and accurately 
dated records constitutes a key limitation in our 
ability to identify and understand the charac- 
teristics of Earth’s evolving climate during the 
Cenozoic. 

Here, we present a new astronomically tuned 
deep-sea benthic foraminifer carbon (5"C) and 
oxygen (5'50) isotope reference record uniformly 
covering the entire Cenozoic, developed in our 
laboratories by using sediment archives re- 
trieved by the International Ocean Discovery 
Program and its predecessor programs (Fig. 1). 
To produce this composite record, we selected 
14 ocean drilling records, checked and revised 
their composite splices if necessary, and pre- 
ferentially selected records using the genera 
Cibicidoides and Nuttallides to minimize sys- 
tematic interspecies isotopic offsets (J, 4, 12, 13). 
We additionally generated new benthic stable- 
isotope data spanning the late Miocene and 


middle to late Eocene to fill intervals inade- 
quately covered by existing records. We collated 
existing astrochronologies for all records, recal- 
ibrated them to the La2010b orbital solution 
(14) if required, and developed an astrochro- 
nology for the middle to late Eocene (13). We 
estimate our chronology to be accurate to 
+100 thousand years (kyr) for the Paleocene 
and Eocene, +50 kyr for the Oligocene to middle 
Miocene, and +10 kyr for the late Miocene to 
Pleistocene. The composite record is affected 
by some spatial biases arising from the uneven 
distribution of deep-sea stable isotope data that 
mainly derive from low to mid-latitudes (13). 
Nevertheless, the resulting Cenozoic Global 
Reference benthic foraminifer carbon and oxy- 
gen Isotope Dataset (CENOGRID) provides a 
refined record with higher signal-to-noise ratio 
than any previous compilations (73) (supple- 
mentary text S1) and better coverage of the 
Paleocene, Eocene, and late Miocene intervals 
(fig. S32). The CENOGRID serves as an astro- 
nomically tuned, high-definition stratigraphic 
reference of global climate evolution for the 
past 66 million years. 

On time scales of 10 thousand to 1 million 
years, global climate is a complex, dynamical 
system responding nonlinearly to quasi-periodic 
astronomical forcing. By combining the latest 
high-resolution generation of Cenozoic deep- 
sea isotope records on a highly accurate time 
scale, CENOGRID enables the definition of 
Earth’s fundamental climates and investigation 
of the predictability of their dynamics. We used 
recurrence analysis (RA) of the CENOGRID 
record (13, 15) to identify fundamental climate 
states that internally share characteristic and 
statistically distinctive dynamics. Recurrence 
is a major property of dynamical systems, and 
RA provides information about nonlinear dy- 
namics, dynamical transitions, and even non- 
linear interrelationships (15) and facilitates 
evaluation of underlying dynamical processes— 
e.g., whether they are stochastic, regular, or 
chaotic. We present recurrence plots and their 
quantification of the benthic foraminifer 
5°C and 880 records to recognize different 
climate states and apply the RA measure of 
“determinism” (DET) to quantify the pre- 
dictability of Cenozoic climate dynamics. 

Four distinctive climate states emerge as 
separate blocks from our recurrence plots of the 
580 CENOGRID record, which we designate as 
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Fig. 1. Cenozoic Global Reference benthic foraminifer carbon and oxygen 
Isotope Dataset (CENOGRID) from ocean drilling core sites spanning the 


past 66 million years. Data are mostly generated by 
tests of the taxa Cibicidoides and Nuttallides extracted 


sea sediments drilled during Ocean Drilling Program (ODP) and Integrated 
Drilling Program (IODP) expeditions. Genus-specific corrections were 
and oxygen isotope data adjusted by +0.64%o and +0.4%o, respectively 
ith the green dot indicating the average oxygen isotope composition of 
the last 10 kyr. Average resolution for the interval from 0 to 34 Ma is one sample 


Ocean 
applied 
(12), wi 


every 2 ky; for the interval from 34 to 67 Ma, it is on 
After binning, data were resampled and smoothed by 
over 20 kyr (blue curve) and 1 Myr (red curve) to accent 


and trends in Earth's carbon cycle and temperature operating on various time scales. 


Oxygen isotope data have been converted to average te 


the Hothouse, Warmhouse, Coolhouse, and Ice- 
house states (Fig. 2). Blocklike structures in the 
recurrence plots identify epochs where the 
dynamical system is “trapped” in a particular 
state. This interpretation of Cenozoic climate 
history is broadly consistent with previous in- 
terpretations, but our recurrence plot analysis 
of the highly resolved CENOGRID data pro- 
vides a more statistically robust and objective 
exposition of events. 

Characteristic features of the four climate 
states can be inferred from the isotope pro- 
files (Fig. 1) and scatterplots of the CENOGRID 
8'°C and 880 data and from atmospheric CO, 
concentration estimates (Fig. 2) (13). Warm- 
house and Hothouse states prevailed from 
the Cretaceous/Paleogene boundary (K/Pg, 
66 Ma) to the Eocene-Oligocene Transition 
(EOT, 34 Ma). During the Warmhouse, global 
temperatures were more than 5°C warmer 
than they are today (73), and benthic 8”C 
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and 8150 show a persistent positive correlation 
with one another. The Hothouse operated be- 
tween the Paleocene-Eocene Thermal Maxi- 
mum at 56 Ma and the end of the Early 
Eocene Climate Optimum (EECO) at 47 Ma 
(16), when temperatures were more than 10°C 
warmer than they are today and displayed 
greater amplitude variability. Transient warm- 
ing events (hyperthermals) are an intrinsic fea- 
ture of the Hothouse, wherein paired negative 
excursions in 8”C and 8'°O reflect warming 
globally through rapid addition of carbon to 
the ocean-atmosphere system. The two Warm- 
house phases from 66 to 56 Ma (Paleocene) 
and 47 to 34 Ma (middle-late Eocene) share a 
similar temperature range but have distinct 
background 8”°C isotope values and atmo- 
spheric CO, concentrations (Fig. 2 and fig. 
$35). At the EOT, the Warmhouse transi- 
tioned into the Coolhouse state, marked by a 
stepwise, massive drop in temperature and a 
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respect to today (13). Future projections for global temperature (44) in the 
year 2300 are shown by plotting th 
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major increase in continental ice volume with 
large ice sheets appearing on Antarctica (17) 
to establish a unipolar glacial state (18). The 
recurrence plots mark out the EOT as the most 
prominent transition of the whole Cenozoic, 
which highlights the important role of ice 
sheets in modulating Earth’s climate state 
(fig. S33) (13). 

The Coolhouse state spans ~34 Ma (EOT) 
to 3.3 Ma (mid-Pliocene M2 glacial) and is 
divided into two phases by the marked shift 
in 8'°O increase at 13.9 Ma related to the 
expansion of Antarctic ice sheets during the 
middle Miocene Climate Transition (mMCT) 
(19). Warmer conditions culminating in the 
Miocene Climatic Optimum (MCO; ~17 to 14 Ma) 
(20) characterize the first phase, followed by 
cooling and increasing 8'°O during the second 
phase (Fig. 2). RA of carbon isotope data 
documents an additional major transition in 
the carbon cycle around 7 Ma related to the 
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Fig. 2. Climate states of the Cenozoic. Deep-sea benthic foraminifer high-resolution 
carbon (A) and oxygen (B) isotope records and the respective recurrence plots as 


well as scatterplots of long-term benthic foraminifer carbon vers 
(C) and oxygen values versus atmospheric COz concentrations ( 


analysis compares climate change patterns occurring in a specifi 


entire record. If climate dynamics have similar patterns, they will 
areas in the plot; if they have no common dynamics, the plot will 
distinct climate states can be identified as Hothouse, Warmhous 


end of the late Miocene carbon isotope shift 
(11, 21, 22). A major change in the correlation 
between benthic foraminifer 5'°C and 8'°O 
occurs during the Pliocene epoch (23). The 
Icehouse climate state (Fig. 2), driven by the 
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appearance of waxing and waning ice sheets in 
the Northern Hemisphere, was fully established 
by the Pliocene-Pleistocene transition (24) (Figs. 
land 2) with Marine Isotope Stage M2 at 3.3 Ma 
being a possible harbinger. The recurrence plots 
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PCO 2 (ppm) 


Icehouse with distinct transitions among them. The relation of oxygen isotopes, 
representative for average global temperature trends, to atmospheric CO2 
concentrations suggests that the present climate system as of today [415 parts per 
million (ppm) COz] is comparable to the Miocene Coolhouse close to the MCO. If 
COs emissions continue unmitigated until 2100, as assumed for the RCP8.5 
scenario, Earth's climate system will be moved abruptly from the Icehouse into the 
Warmhouse or even Hothouse climate state. LGM, Last Glacial Maximum; MECO, 
Middle Eocene Climate Optimum; PETM, Paleocene/Eocene Thermal Maximum. 


are less pronounced and more transparent from 
3.3 Ma to today (Fig. 2 and fig. $34), suggesting 
that Earth’s climate cryosphere dynamics entered 
a state not comparable to anything seen in the 
preceding 60 or more million years. 
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four climate states. Frequencies between 2 and 60 cycles per million years are related to changes in 
Earth's orbital parameters, known as Milankovitch cycles. The FFT spectrograms were computed with a 
5-Myr window on the detrended records of benthic carbon and oxygen isotope data. From 67 to 13.9 Ma, 
cyclic variations in global climate are dominated by the eccentricity cycles of 405 and 100 kyr. 
Thereafter, in particular in the oxygen isotope record, the influence of obliquity increased, dominating 
the rhythm of climate in the record younger than ~7.7 Ma. Recurrence analysis of determinism (DET) 
shows that climate in the Warmhouse state is more deterministic (predictable) than in the Hothouse, 
Coolhouse, and Icehouse. From 47 Ma toward the EOT at 34 Ma, climate dynamic changes are rising in 
amplitude, approaching a threshold in the climate system. If DET tends to low values, the dynamics are 
stochastic, whereas high values represent deterministic dynamics. 
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The CENOGRID allows us to scrutinize the 
state dependency of climate system response 
to CO, and astronomical forcing on time 
scales of 10 thousand to 1 million years (13). 
Astronomical forcing throughout the Cenozoic 
is consistently uniform, but the RA indicates 
that the nonlinear response in climate varia- 
bility to this forcing is strongly influenced by 
the fundamental state of climate. Evolutionary 
spectrograms characterize the dominant cli- 
matic response to astronomical forcing during 
the Cenozoic (Fig. 3). We find that the prevail- 
ing climate state, as characterized by atmo- 
spheric CO, concentration and polar ice 
sheets, orchestrates the response of climate 
processes to astronomical forcing. Modeled 
insolation-driven global temperature varia- 
bility on astronomical time scales suggests 
that different temperature-response regimes 
exist: Eccentricity dominates temperature re- 
sponses in low latitudes, precession in mid- 
latitudes, and obliquity in high latitudes (25). 
Thus, pronounced astronomical cyclicity in 
the CENOGRID could reflect climate state- 
dependent amplifications of latitude-specific 
climate processes. 

In the Hothouse and Warmhouse, as well as 
the first Coolhouse phase, eccentricity-related 
cycles dominate the CENOGRID records, indi- 
cating a strong influence of low-latitude pro- 
cesses on climate variations. Obliquity-related 
cycles are sparse in these intervals but have 
been documented in other geochemical records 
(26, 27), exhibiting perhaps local lithological 
responses. Weak response in the obliquity band 
during the Hothouse and Warmhouse intervals 
might be related to the absence of a high- 
latitude ice sheet that could have amplified 
climate response to obliquity forcing. The 
driving mechanism for the prevailing eccen- 
tricity cyclicity in the benthic 5°C and &'%O 
records is still unknown, but modeling sug- 
gests that low- and mid-latitude processes in 
the climate system respond in a nonlinear way 
to insolation forcing (25, 28-30). In this regard, a 
key feedback likely involves the hydrological 
cycle with highly seasonal precipitation pat- 
terns during intervals of strong monsoon re- 
sponse to precession-induced insolation change, 
which could play a major role in the global 
distribution of moisture and energy (37-34). The 
expression of precession is apparently weak in 
the CENOGRID composite record, despite the 
dominant eccentricity forcing, likely owing to 
the long residence time of carbon in the oceans 
enhancing longer forcing periods (30, 35), as 
well as our strategy to avoid “overtuning” the 
record. After the increasing influence of high- 
latitude cooling and ice growth during the 
second Coolhouse phase, the obliquity-band 
response steadily increases after the mMCT 
before dominating climate dynamics by the 
late Miocene-early Pliocene (J/, 22, 36). In the 
Icehouse state, the progressive decrease in 
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atmospheric CO, and major growth of polar 
ice sheets, which enhanced variability in 880, 
steadily amplified the influence of complex 
high-latitude feedbacks until they essentially 
dominated climate dynamics. 

To better understand the complexity of 
climate dynamics recorded in the CENOGRID, 
we computed the RA measure of DET (13). 
This parameter quantifies the predictability 
of dynamics in a system's state. Predictability 
estimates the stochastic (unpredictable) ver- 
sus the deterministic (predictable) nature of 
climate dynamics recorded in CENOGRID (13). 
DET values near zero correspond to unpredictable 
dynamics, whereas large values indicate pre- 
dictable dynamics, which are especially interest- 
ing to examine on the approach to tipping 
points. Changes in DET can thus reveal tran- 
sitions between fundamentally different climate 
regimes. 

Our RA suggests that climate dynamics 
during the Warmhouse and Hothouse Ceno- 
zoic states are more predictable or more reg- 
ular than those of the Coolhouse and Icehouse 
states (Fig. 3). The growth of polar ice sheets 
at the EOT enhanced the effect of obliquity 
pacing of high-latitude climate that interacted 
with eccentricity-modulated precession forc- 
ing at lower latitudes from that point in time. 
This led to increased nonlinear interactions 
among astronomically paced climate processes 
and, thus, more complex, stochastic climate 
dynamics. The development of a large Antarctic 
ice volume at the inception of the Coolhouse 
is associated with a fundamental regime change 
toward less predictable climate variability 
(ower DET values calculated from benthic 
180) (Fig. 3). From 25 to 13.9 Ma DET is ele- 
vated again, related to a reduction in ice volume 
in relatively warmer times of the Coolhouse, 
culminating in the MCO. Despite the grow- 
ing influence of ice sheets in the Coolhouse, 
until ~6 to 7 Ma, carbon-cycle dynamics re- 
main more deterministic than temperature 
because 8”C variations are predominantly 
driven by low-latitude processes and less 
strongly influenced by the complex interaction 
with polar ice-sheet fluctuations. After ~6 Ma 
DET drops, likely because of a stronger cryo- 
sphere imprint on the carbon cycle. Upon ini- 
tiation of the Icehouse at 3.3 Ma, 8'°O recorded 
climate dynamics become slightly more deter- 
ministic (37) and carbon-cycle dynamics un- 
predictable, likely resulting from the complex 
response to the waxing and waning of polar 
ice caps (38). 

The CENOGRID spectrogram displays a 
broader frequency range during several intervals 
with low DET values (e.g., Coolhouse), whereas 
high DET values (e.g., Warmhouse) occur when 
single frequencies dominate (Fig. 3). This could 
be signaling a more direct response to astro- 
nomical forcing in the Warmhouse compared 
with that in the Coolhouse. Our RA suggests 
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that the Hothouse is more stochastic (less 
predictable) than the Warmhouse, presumably 
induced by the occurrence of extreme hyper- 
thermal events and their strong nonlinear and 
much-amplified climate response to astronom- 
ical forcing (39, 40). The evolving pattern in the 
DET from the onset of cooling after the EECO 
to the EOT is pronounced (Fig. 3). The am- 
plitude in fluctuations between stochastic 
and deterministic dynamics intensifies from 
49 Ma to 34 Ma, consistent with how Earth’s 
climate system is suggested to behave (41, 42) 
as it moves toward a major tipping point. Once 
that tipping point is reached at the EOT, a rapid 
shift toward more permanently stochastic dy- 
namics marks the inception of a new climate 
state (43). Thus, not only is polar ice volume 
critical to defining Earth’s fundamental cli- 
mate state, it also seems to play a crucial role 
in determining the predictability of its clima- 
tological response to astronomical forcing. 
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CIRCADIAN RHYTHMS 


The hepatocyte clock and feeding control 
chronophysiology of multiple liver cell types 


Dongyin Guan?”, Ying Xiong’, Trang Minh Trinh*, Yang Xiao", Wenxiang Hu*’, Chunjie Jiang’, 
Pieterjan Dierickx’?, Cholsoon Jang**, Joshua D. Rabinowitz°, Mitchell A. Lazar>?:4+ 


Most cells of the body contain molecular clocks, but the requirement of peripheral clocks for 
rhythmicity and their effects on physiology are not well understood. We show that deletion of core 
clock components REV-ERBa and REV-ERBB in adult mouse hepatocytes disrupts diurnal rhythms 
of a subset of liver genes and alters the diurnal rhythm of de novo lipogenesis. Liver function is 
also influenced by nonhepatocytic cells, and the loss of hepatocyte REV-ERBs remodels the rhythmic 
transcriptomes and metabolomes of multiple cell types within the liver. Finally, alteration of food 
availability demonstrates the hierarchy of the cell-intrinsic hepatocyte clock mechanism and the 
feeding environment. Together, these studies reveal previously unsuspected roles of the hepatocyte 
clock in the physiological coordination of nutritional signals and cell-cell communication controlling 


rhythmic metabolism. 


iological rhythms are intricately involved 

in sleeping-waking, feeding-fasting, and 

activity-rest phenomena, and they are 

essential to maintaining physiological 

homeostasis (7). The mammalian core 
clock includes transcriptional activators BMAL1/ 
CLOCK and transcriptional repressors REV- 
ERBo and REV-ERBB that function in inter- 
locked transcriptional feedback loops (2). 
Central clocks in the suprachiasmatic nucleus 
(SCN) are believed to synchronize clocks in 
peripheral tissues (3), and dyssynchrony of 
this system is associated with metabolic dys- 
function (4, 5). Nevertheless, major questions 
remain as to how the environment and ge- 
netic factors control the clocks in peripheral 
tissues and whether communication exists 
between clocks in different cell types within 
an organ. 

To dissect the cell-autonomous and non- 
autonomous regulation of diurnal rhythms in 
peripheral tissues, we focused on the liver, a 
metabolic hub (6). REV-ERBoa and REV-ERBB 
were specifically deleted in hepatocytes 
(HepDKO; DKO, double knockout) by inject- 
ing the AAV8-TBG-CRE virus into adult 
REV-ERBa/f floxed mice. This model excludes 
developmental effects and potential confound- 
ing due to direct manipulation of the clock in 
other tissues (7, 8). Expression of both REV- 
ERBo and REV-ERBB was nearly undetectable 
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after 2 weeks, even at zeitgeber time 10 (ZT10), 
when REV-ERBs were highly expressed at the 
mRNA (Fig. 1A) and protein level (fig. S1). 
REV-ERBs physiologically repress Bmall and 
Npas2 in a circadian manner (9, 10), and 
both of these genes were constitutively dere- 
pressed in the REV-ERB HepDKO (Fig. 1B). 
Other core clock genes also demonstrated re- 
duced rhythmicity (Fig. 1B). 

We next examined the effect of REV-ERB 
HepDKO on the liver rhythmic transcriptome. 
Two weeks after adeno-associated virus (AAV) 
treatment, RNA sequencing (RNA-seq) per- 
formed on livers harvested every 3 hours re- 
vealed the attenuation of the rhythmicity of 
a large group of transcripts that were highly 
rhythmic in the controls, including genes 
involved in diurnal rhythm pathways such 
as Bmall, Npas2, and Clock (Fig. 1C, fig. S2A, 
and table S1A). This observation fits the pre- 
vailing hypothesis that REV-ERBs are major 
controllers of the clock and suggests that the 
rhythmic expression of these genes depends 
on the intrinsic core clock feedback loop. 
Many genes, however, maintained diurnal 
rhythmicity in the absence of REV-ERBs 
(Fig. 1D, fig. S2B, and table S1B). Among 
these were ~170 genes, enriched for lipid 
metabolism, that showed enhanced rhyth- 
mic amplitudes (Fig. 1E, fig. S2C, and table 
S1C). KEGG (Kyoto Encyclopedia of Genes 
and Genomes) and gene set enrichment analy- 
sis (GSEA) indicated that rhythmic transcripts 
regulated by REV-ERBs were involved in circa- 
dian rhythms, hormone secretion, and lipid 
metabolism (fig. $2, A to D). These results 
indicated an unexpected rhythmic transcrip- 
tomic reprogramming in the liver upon the 
depletion of REV-ERBs in adult hepatocytes. 
Importantly, rhythmic locomotor activity (fig. 
S3A), feeding (fig. S3B), and plasma insulin 
levels (fig. SC) were not much affected in REV- 
ERB HepDKO mice, indicating that disrup- 
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tion of the hepatocyte clock did not affect 
diurnal activity and feeding responses and 
excluding the possibility that the remodeling 
of the liver rhythmic transcriptome was due 
to changes in behavior. 

Rhythmic expression of REV-ERB directly 
regulates many target genes by binding to 
ROR/REV-ERB-response element (RORE), 
where it represses transcription by recruiting 
co-repressors, by competing with ROR nuclear 
receptors, and by tethering to liver factors (/7). 
ROR targets represented a high percentage 
of rhythm disrupted but not enhanced tran- 
scripts (fig. S3D), suggesting that REV-ERB’s 
direct binding was more relevant to the rhyth- 
mically disrupted transcripts. Rhythmic tran- 
scriptome remodeling was confirmed in livers 
expressing REV-ERBa DNA binding domain 
deficient mutant and lacking REV-ERBB (fig. 
S3E) (72). 

To explore the transcriptional mechanism 
underlying rhythmic disruption in hepatocytes 
upon REV-ERB HepDKO, we used CistromeDB 
(13) to perform transcription factor (TF) bind- 
ing similarity screening based on all published 
liver cistromes. REV-ERBs and their co- 
repressors HDAC3 and NCORI were the top 
TFs bound near genes whose diurnal rhythm 
was disrupted by REV-ERB HepDKO (Fig. IF 
and table S2A). The binding sites of BMAL1, 
PER2, and CRYI1 were enriched in rhythm- 
retained transcripts, suggesting that systemic 
signals drive rhythmic gene expression via these 
core clock genes (/4) (fig. S3F and table S2B). 

Although there is no REV-ERBa binding 
site near Srebf1, SREBF1 was the most enriched 
TF near genes whose diurnal rhythms were 
induced by the loss of REV-ERBs (Fig. 1G 
and table S2C). The rhythmic expression of 
Srebfl was enhanced upon REV-ERB HepDKO, 
as was that of many of its target genes that are 
involved in de novo lipogenesis (DNL) (Fig. 1H), 
which is consistent with a previous REV-ERBo 
whole-body knockout mouse model (15). En- 
hanced diurnal rhythmic expression of Srebf1 
was also observed in livers from reverse phase 
feeding (RPF) Cry! ~Cryz! ~ mice (16), sug- 
gesting a general role of core clock TF repres- 
sors in maintaining the homeostasis of hepatic 
lipid metabolism. The physiological signifi- 
cance of this finding was assessed by directly 
measuring DNL using deuterated water as a 
tracer. Consistent with the enhanced rhythm 
of Srebf1 and the DNL pathway, the normal 
rhythm of DNL was markedly amplified in 
the livers of the REV-ERB HepDKO mice 
(Fig. 1D. This amplification was accompa- 
nied by an increase in the amplitude of plas- 
ma triglyceride rhythms, both on normal 
chow (Fig. 1J) and on high-fat, high-sucrose 
(HFHS) diet (Fig. 1K). Consistent with the 
increase in DNL, liver TG concentration was 
also increased in the livers of the HFHS-fed 
mice (Fig. 1L). Thus, REV-ERBs in hepatocytes 
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Fig. 1. Disruption of REV-ERBa and REV-ERBB in hepatocytes remodels the 
liver diurnal rhythmic transcriptome and lipid metabolism. (A and B) Relative 
mRNA expression of Rev-erba and Rev-erbB (A) and REV-ERBs target genes 
(B) in control and HepDKO livers. (C to E) Heatmap of the relative expression 
of rhythm disrupted (C), retained (D), and enhanced (E) transcripts in control 
and HepDKO livers. The color bar indicates the scale used to show the expression 
of transcripts across eight time points, with the highest expression normalized 

to 1. JTK_CYCLE (48), adjusted P < 0.01, 21 hours < period (t) < 24 hours, peak- 
to-trough ratio > 2 (n = 3 mice per time point). (F and G) TF binding similarity 
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screening on rhythm disrupted (F) and enhanced (G) transcripts based on all 
published liver cistromes from CistromeDB (13). (H) Relative mRNA expression of 
Srebfl and its target genes in control and HepDKO livers (n = 4 to 6 mice per time 
point). (1) Incorporation of deuterated water into liver fatty acids was measured 

in mice 6 hours after oral gavage of D20 at ZT8 or ZT20. Data are presented 

as mean + SEM. *P < 0.05 in Student's t test (n = 6 mice per group). (J) Serum 
triglyceride (TG) measurements in control and HepDKO mice. (K and L) Serum 
TG (K) and hepatic TG (L) measurements in HFHS-fed control and HepDKO mice. 
Data are presented as mean + SEM (n = 4 to 6 mice per time point). 
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Fig. 2. Hepatocyte REV-ERBs 
control nonhepatocytic diurnal 
rhythmic transcriptomes. 

(A) Uniform manifold approxima- 
tion and projection (UMAP) 
visualization of liver cell clusters 
based on 18,239 single-cell 
transcriptomes. (B) The number 
of differentially expressed tran- 
scripts in hepatocytes (top) 

or nonhepatocytes (bottom) upon 
REV-ERBs HepDKO. (C) Relative 
mRNA expression of Rev-erba, 
Rev-erbB, and Bmall in isolated 
hepatocytes, ECs, and KCs from 
control and HepDKO livers. 

(D and E) Identification of diurnal 
rhythmic transcripts (D) and 
enhancers (E) in isolated ECs 
from control and HepDKO livers. 
JTK_CYCLE, adjusted P < 0.05, 21 
hours < period (t) < 24 hours, 
peak-to-trough ratio > 1.5. 

(F and I) Rose diagrams showing 


the prevalen 
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transcripts in each phase group, 
and motifs enriched at sites of 
rhythmic enhancers, which were 
correlated with rhythm disrupted 
(F) and enhanced (I) transcripts 
and enhancers from IMAGE in 
isolated ECs. (G and J) Correla- 
tion of mean expression of 
putative target genes and relative 
TF transcription activity in four 
phase groups in isolated ECs from 
control (G) and HepDKO (J) 
livers. In each plot, the bars 
represent the mean expression of 
putative TF target genes of each 
phase, and the black line repre- 
sents the predicted TF relative 
transcription activity. Correlation 
coefficient r shows the strength 
of the relationship between 

the mean expression of putative 
TF target genes and relative 
transcription activity. 

(H and K) Expression level 
(normalized read counts) of K/f9 
(H) and Gata4 (K) in isolated ECs 
from control and HepDKO livers. 
(L and M) Identification of diurnal 
rhythmic transcripts (L) and 
enhancers (M) in isolated KCs 
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are required to maintain lipid metabolism 
homeostasis. 

Although hepatocytes are the most abun- 
dant cell type in the liver, the organ is com- 
posed of many other cell types that have 
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critical roles in metabolic diseases (17-19). 
To better understand the effects of hepatocyte 
clock disruption, we performed single-nucleus 
RNA sequencing (sNuc-seq) on livers har- 
vested at ZT8, when REV-ERBs are highly 
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Fig. 3. Hepatocyte REV-ERBs regulate nonhepatocytic diurnal rhythmic metabolic process. (A and 

C) Metabolic pathway analysis integrating the enrichment of genes and metabolites in rhythm disrupted 

(A) and enhanced (C) transcripts and metabolites in isolated ECs. (B and D) Examples of rhythm disrupted (B) 
and enhanced (D) metabolites and related transcripts in ECs upon REV-ERBs DKO in hepatocytes. 

(E and G) Metabolic pathway analysis integrating the enrichment of genes and metabolites in rhythm 

disrupted (E) and enhanced (G) transcripts and metabolites in isolated KCs. (F and H) Examples of rhythm 
disrupted (F) and enhanced (H) metabolites and related transcripts in KCs upon REV-ERBs DKO in hepatocytes. 
Pathways were considered significant if P < 0.01 using hypergeometric test. Metabolites and transcripts data 

are presented as mean + SEM (n = 3 or 4 mice per time point). 
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expressed, from control and HepDKO mice. 
sNuc-seq avoided skewing the results against 
lipid-laden hepatocytes that may be lost be- 
cause of lysis or size exclusion during single- 
cell isolation, and ~3000 genes were detected 
per nucleus. On the basis of cell-specific mark- 
ers (fig. S4, A and B), populations of hepato- 
cytes, endothelial cells (ECs), Kupffer cells 
(KCs), stellate cells, and immune cells were 
clearly distinguishable, as were the subpop- 
ulations of hepatocytes corresponding to 
the previously defined markers of zonation 
(20) (Fig. 2A). As expected, many changes in 
gene expression were observed between con- 
trol and HepDKO hepatocytes (Fig. 2B), with 
about two-thirds of the changes being com- 
mon to hepatocytes in different zones (fig. 
S4C). The percentage of different cell popu- 
lations in the liver was largely unchanged (fig. 
S4D), but gene expression in nonhepatocyte 
cells in the REV-ERB HepDKO livers was 
markedly altered, with the largest number of 
changes observed in ECs (Fig. 2B). Consider- 
able changes were also noted in KCs, which 
are liver-resident macrophages that have crit- 
ical roles within the organ (17). Together, these 
two cell types were selected for more detailed 
studies. 

To quantify whole-cell transcriptomes with 
greater depth than is possible using sNuc-Seq, 
we performed diurnal rhythmic transcriptom- 
ics on ECs and KCs isolated every 6 hours, 
2 weeks after hepatocyte-specific deletion 
of REV-ERBs. The deletion of Rev-erbo and 
Rev-erbB, along with their constitutively in- 
duced repression target Bmall, was confirmed 
in isolated hepatocytes. Rev-erba/B gene ex- 
pression was virtually unchanged in the ECs 
and KCs from the HepDKO livers, although 
the amplitude of Rev-erba/® rhythms was 
muted in KCs (Fig. 2C). The relative expres- 
sion of lineage-specific markers Stab2 (ECs) 
and Csfir (KCs) confirmed the specificity of 
the cell populations (fig. S4E). 

Despite the physiologically rhythmic expres- 
sion of the core clock genes, the diurnal rhyth- 
mic transcriptomes were extensively remodeled 
in ECs (Fig. 2D, fig. S5A, and table S3). These 
results indicated that disruption of the hepa- 
tocyte clock was communicated to the ECs. 
In addition, we quantified enhancer RNA 
expression in isolated ECs by mapping RNA- 
seq reads to intergenic regions of open chromatin 
determined by assay for transposase-accessible 
chromatin using sequencing (ATAC-seq) (20), 
which identified a widespread reprogram- 
ming of rhythmic enhancers (Fig. 2E, fig. S5B, 
and table S3). 

We next used integrated analysis of motif 
activity and gene expression (IMAGE) (22) to 
ascertain sequence motifs enriched at sites of 
rhythmic enhancers associated with rhythmic 
genes to identify potential TFs with corre- 
sponding binding preferences and diurnal 
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rhythmicity. These putative factors that were 
potentially responsible for rhythm disrupted 
and enhanced enhancers and transcripts were 
identified (table $4). For example, Kruppel- 
like factor 9 (KLF9), a ubiquitous regulator 
of oxidative stress (23), was identified as one 
of the putative TFs responsible for the loss of 
rhythmic enhancers associated with lost rhyth- 
mic genes that peaked between ZTO and ZT6 
(fig. S5C and Fig. 2F), and there was a positive 
correlation between KLF9 transcription activ- 
ity and its putative target gene expression (Fig. 
2G). Indeed, the expression of K//9 was rhyth- 
mic in control cells but not in ECs from 
HepDKO livers (Fig. 2H). Conversely, gained 
rhythmic enhancers peaking between ZT6 and 
Z7112 (fig. S5D and Fig. 21) were enriched for 
the GATA-binding motif (Fig. 2J), correspond- 
ing to a gained rhythmic expression of Gata4, 
a known regulator of the hepatic microvascu- 
lature (24) (Fig. 2K). 

Similarly, the KC rhythmic transcriptome 
was extensively reprogrammed in REV-ERB 
HepDKO livers (Fig. 2L, fig. S5E, and table S5), 
and this was associated with both the loss and 
gain of rhythmic enhancers (Fig. 2M, fig. S5F, 
and table S5). The factors identified as po- 
tentially responsible for rhythmic disrupted 
and enhanced enhancers and transcripts are 
listed in table S4. As an example, the PPAR- 
binding motif was enriched at sites of ZTO to 
Z1T6 rhythmic enhancers that decreased in 
KCs of the HepDKO livers (fig. S5G and Fig. 
2N) and was associated with the highest tran- 
scriptional activity in this phase (Fig. 20). Con- 
sistent with the transcriptional activity, the 
expression of Ppara, a regulator of the mac- 
rophage inflammatory response (25), was also 
rhythmic, peaking between ZTO and ZT6 in 
control cells but not in KCs isolated from 
HepDKO livers (Fig. 2P). In contrast, the motif 
of Jun dimerization protein 2 (JDP2) was en- 
riched in REV-ERB HepDKO-specific enhancers 
whose activity peaked between ZT12 and ZT18 
(fig. S5H and Fig. 2Q) and also had the highest 
predicted transcriptional activity in this phase 
(Fig. 2R). The phase of the gained rhythmic 
expression of JDP2 was antiphase to its trans- 
criptional activity (Fig. 2S), consistent with 
its transcriptional repression function (26). 
Moreover, comparative analysis of rhythmic 
remodeled transcripts between hepatocytes, 
ECs, and KCs revealed little overlap between 
different cell types, indicating a cell type- 
specific response to loss of REV-ERB in hepa- 
tocytes (fig. S6, A to C). 

To uncover potential signals from hepato- 
cytes lacking REV-ERBs to other cell types, 
we used NicheNet (27) to identify ligand- 
receptor pairs in which the ligand was altered 
in HepDKO hepatocytes, and the receptor 
was expressed in ECs, or KCs and the down- 
stream genes exhibited enhanced (Fig. 2T 
and fig. S6D) or disrupted (Fig. 2U and fig. 
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S6E) rhythms. For example, the colony stim- 
ulating factor 1 gene Csf7 lost rhythmicity in 
HepDKO hepatocytes (Fig. 2V). The CSF1 re- 
ceptor was expressed in KCs, and although 
it was not rhythmically expressed, downstream 
genes of the CSF1 signaling pathways such 
as Cxcl10 (28) lost rhythmicity (Fig. 2V). These 
results demonstrate how disruption of the 
hepatocyte clock could lead to altered diurnal 
rhythms of gene expression in surrounding 
nonhepatocytic cells. Note that this analysis 
does not incorporate posttranscriptional reg- 
ulation of predicted ligands and receptors 
that were not regulated at the transcript level 
(table S6) (29). 

To understand the impact of HepDKO- 
induced diurnal rhythm remodeling on non- 
hepatocytic cells, we performed GSEA on the 
rhythmic transcriptomes of ECs and KCs. Lipid 
metabolism-related pathways were found to 
be enriched in both ECs and KCs (fig. S7, A 
and B). This rhythm remodeling may be reg- 
ulated not only via mapped ligand-receptor 
pairs but also via metabolites from hepato- 
cytes, because we observed rhythmic metab- 
olome reprogramming in isolated hepatocytes 
in the liver upon depletion of REV-ERBs 
(fig. S7, C and D, and table S87). Consist- 
ently, mouse phenotype enrichment anal- 
ysis (30) indicates that phenotypes most 
enriched in altered rhythmic transcripts 
of both ECs and KCs from HepDKO livers 
were related to homeostasis and metabolism 
(fig. S7, E and F). 

To test this prediction, we performed diur- 
nal rhythmic metabolomic profiling, identify- 
ing many metabolites whose diurnal rhythms 
were disrupted or enhanced in ECs and KCs 
from HepDKO livers (fig. S7G and table S7). 
Integrated analysis of rhythm-remodeled tran- 
scripts and metabolites by MetaboAnalyst (37) 
revealed a number of significantly affected 
metabolic pathways. In ECs, multiple rhyth- 
mic metabolic pathways were disrupted, in- 
cluding glutathione metabolism (Fig. 3A and 
fig. S8), as illustrated by expression of the 
Gp! gene and glutathione disulfide (Fig. 3B). 
Other pathways exhibited enhanced diurnal 
rhythmicity, including glucose metabolism 
and its conversion into hexosamines (Fig. 3C), 
as illustrated by the gained rhythm of Pfkl 
gene expression and uridine diphosphate-N- 
acetyl-glucosamine levels (Fig. 3D). These 
changes likely affect the function of ECs, 
which rely on glycolysis for energy production, 
with the hexosamine pathway controlling 
nitric oxide (NO) production and angiogen- 
esis (32). In KCs, the correlated rhythmic 
disrupted transcripts and metabolites were 
related to lipid metabolism (Fig. 3E), exem- 
plified by Enppé6é gene expression and docosa- 
tetraenoic acid (C22:4) levels (Fig. 3F) (33, 34), 
whereas rhythm enhanced pathways included 
one-carbon metabolism (Fig. 3G) regulated 
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by the Dhfr gene (Fig. 3H). Together, the cell 
type-specific rhythm remodeling in nonhe- 
patocytic cells upon the loss of hepatocyte 
REV-ERBs identifies a previously unknown, 
coordinated response to hepatocyte clock 
disturbance. 

Although light-dark cycles act as zeitgebers 
to entrain behavioral rhythms via the central 
rhythmic oscillator in the SCN of the hypo- 
thalamus, feeding-fasting cycles are important 
synchronizers of peripheral clocks (35-37), 
and time-restricted feeding uncouples liver 
rhythms from behavioral rhythms (35). Hav- 
ing demonstrated the role of the hepatocyte 
clock in controlling cell-autonomous and non- 
cell-autonomous rhythms in the liver, we next 
considered its role in the response to nutrition 
by performing diurnal rhythmic transcriptomic 
analysis on mice subjected to 3 weeks of RPF, 
in which food was available only during the 
light phase (Fig. 4A). As expected, based on 
previous work (35), RPF of control mice led to 
a 12-hour phase shift in the rhythms of core 
clock genes such as Rev-erba and Rev-erbB 
(Fig. 4B). Transcriptomic analysis indicated 
that nearly all rhythmic transcripts exhibited a 
12-hour phase shift in the livers of control mice 
under RPF (Fig. 4C), suggesting a dominant 
role of feeding on rhythmic phase regulation. 

The rhythm of the core clock gene Bmall 
was also phase shifted by ~12 hours under 
RPF in control livers. In contrast, in the livers 
of REV-ERB HepDKO mice, Bmall expres- 
sion was constitutive, robust, and nonrhyth- 
mic both under RPF and ad libitum (ad lib) 
feeding (Fig. 4D), indicating cell-autonomous 
clock regulation of the hepatocyte endogenous 
clock by REV-ERBs. Because most rhythmic 
genes were phase shifted ~12 hours by RPF, 
we assessed changes in rhythmicity using a 
classification that integrated amplitude (fold 
change of peak-to-trough ratio > 2), period (be- 
tween 21 and 24 hours), and adjusted P value 
(<0.01) from the JKT algorithm (38). This 
analysis identified four categories of rhyth- 
mic genes: (i) HepDKO dominant (rhythmic- 
ity of transcripts is changed only in HepDKO 
livers); (ii) RPF dominant (rhythmicity of tran- 
scripts is changed only in livers from RPF 
mice); (iii) regulated by both HepDKO and 
RPF (including cooperative, redundant, or 
opposing changes); and (iv) retained rhythm 
in HepDKO and RPF (rhythmicity unchanged 
in HepDKO+RPF). 

Of all rhythmic transcripts, 11.5% were 
HepDKO dominant (Fig. 4E, fig. S9A, and 
table S8A), and on the basis of TF binding 
similarity screening analysis, this group of 
rhythmic transcripts was likely directly regu- 
lated by REV-ERB and its corepressor com- 
plexes (Fig. 4F). RPF-dominant transcripts 
represented 30.7% of rhythmic transcripts 
(Fig. 4E and table S8B), implying non-cell- 
autonomous regulation by feeding. For example, 
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the diurnal rhythmicity of Slc25a51 was in- 
distinguishable in control and HepDKO livers 
from ad lib fed mice but disrupted in both 
control-RPF and HepDKO-RPF livers (Fig. 4G). 
Binding sites for STAT (signal transducer and 
activator of transcription) and GR (gluco- 
corticoid receptor) TFs were enriched near 
these genes (Fig. 4F) (39, 40). 

Forty-five percent of rhythmic transcripts 
were regulated by both HepDKO and RPF, 
either cooperatively, oppositely, or redundant- 
ly (Fig. 4E and fig. S9B). Binding sites enriched 
near these genes included those of lipid- 
regulating liver X receptor (Fig. 4F), whose 
activation was reported to be rhythmically 
enhanced in livers of REV-ERBa whole-body 
knockout mice (15). Cooperative changes were 
exemplified by the Ppara gene (Fig. 4H and 
table S8C). Note that these results largely re- 
flect hepatocytes, whose Ppara expression 
pattern was different from that shown for 
KCs. In contrast, the HepDKO-induced di- 
urnal rhythmic enhancement of Phf8 was 
negated by RPF while the rhythmic dis- 
rupting effect by RPF on Cend1 was counter- 
acted by HepDKO (fig. S9, B and C). The 
cooperative and opposing effects on rhyth- 
micity demonstrate interdependence of the 
hepatocyte clock and feeding. However, 
for genes classified as redundant, the sep- 
arate effects of HepDKO and RPF on rhyth- 
micity were similar to each other and to 
the combination (e.g., Srebf1) (fig. S9, B 
and D). 

In the final group of rhythmic transcripts, 
although the phase was dependent on food 
entrainment, the rhythmicity per se was re- 
tained in both HepDKO and RPF, suggesting 
that the rhythmic expression of transcripts in 
this group was controlled by other signals in- 
dependent of the intrinsic clock and feeding 
(Fig. 4, E and I, and table S8D). Interesting- 
ly, although the rhythmic mRNA expression 
of core clock genes Bmall, Cry1, and Per2 was 
attenuated upon REV-ERB depletion, the 
binding sites were still enriched in these 
nonintrinsic rhythmic transcripts (Fig. 4F), 
suggesting that systemic signals drive the 
rhythmic transcription activity of these TFs 
(14, 41). 

Finally, we sought to determine the extent 
to which the hepatocyte clock and feeding- 
fasting cycles control diurnal rhythms in non- 
hepatocytes. We defined the EC-specific rhythmic 
genes using RNA-seq data from ECs isolated 
from the HepDKO livers and then determined 
their rhythmic expression during RPF, both in 
control and REV-ERB HepDKO livers. Nota- 
bly, ~74% of rhythmic genes (Fig. 4 and Table 
S9A) were regulated by both HepDKO and 
RPF, with enrichment for genes regulating 
NO synthesis (fig. S9E), including EC-specific 
Ddah2 (42) (Fig. 4K). Similarly, in KCs, ~65% 
of cell-specific rhythmic genes were regulated 
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by both HepDKO and RPF (Fig. 4L and table 
S9B), with enrichment for genes regulating 
histone-serine phosphorylation (fig. S9F), in- 
cluding DelreIb, which regulates DNA repair 
(43) and whose rhythmic expression was KC- 
specific in the liver (Fig. 4M). Thus, nonauton- 
omous signals resulting from feeding and 
communication from hepatocytes play vital 
roles in the rhythmic gene expression of non- 
hepatocytic cells in the liver. 

Our studies shed light on the physiological 
importance and function of peripheral clocks, 
whose existence was originally established 
in vitro (44-46). We demonstrate that some but 
not all hepatocyte diurnal rhythms are con- 
trolled by the core clock in a cell-autonomous 
manner in vivo. Moreover, the enhanced di- 
urnal rhythms upon REV-ERB deletion (e.g., 
DNL genes) suggest that the clock not only 
anticipates daily environment changes but 
also buffers against certain fluctuations. Pre- 
vious studies manipulating the liver clock 
found that it was not essential for weight 
loss due to food restriction during the normal 
feeding period (47) or behavioral diurnal 
rhythms for which the light-dark cycle acts 
as a zeitgeber (7). However, when feeding 
is restricted to the light phase, it becomes 
the predominant hepatocytic zeitgeber for 
the liver (35), and our studies demonstrate 
the hierarchy and interdependence of feed- 
ing and the cell-autonomous clock for di- 
urnal rhythmic hepatocyte gene expression. 
Moreover, rhythmic gene expression and 
metabolism in nonhepatocytic cells in the 
liver are highly influenced both by the he- 
patocyte clock and feeding. These findings 
are likely to apply to peripheral clocks in other 
cell types. 
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CORONAVIRUS 


A molecular pore spans the double membrane 
of the coronavirus replication organelle 
Georg Wolff, Ronald W. A. L. Limpens?, Jessika C. Zevenhoven-Dobbe’, Ulrike Laugks?, 


Shawn Zheng’, Anja W. M. de Jong’, Roman I. Koning?, David A. Agard®, Kay Griinewald*°, 
Abraham J. Koster’, Eric J. Snijder*, Montserrat Barcena’* 


Coronavirus genome replication is associated with virus-induced cytosolic double-membrane vesicles, 
which may provide a tailored microenvironment for viral RNA synthesis in the infected cell. However, 

it is unclear how newly synthesized genomes and messenger RNAs can travel from these sealed replication 
compartments to the cytosol to ensure their translation and the assembly of progeny virions. In this 
study, we used cellular cryo—electron microscopy to visualize a molecular pore complex that spans both 
membranes of the double-membrane vesicle and would allow export of RNA to the cytosol. A hexameric 
assembly of a large viral transmembrane protein was found to form the core of the crown-shaped 
complex. This coronavirus-specific structure likely plays a key role in coronavirus replication and thus 


constitutes a potential drug target. 


evere acute respiratory syndrome coro- 
navirus 2 (SARS-CoV-2) is the third and 
most impactful example of a potentially 
lethal coronavirus infection in humans 
within the past 20 years (/-3). Corona- 
viruses are positive-stranded RNA (+RNA) 
viruses that replicate their unusually large ge- 
nomes in the host cell’s cytoplasm. This pro- 
cess is supported by an elaborate virus-induced 
network of transformed endoplasmic retic- 
ulum (ER) membranes known as the viral rep- 
lication organelle (RO) (4-7). Double-membrane 
vesicles (DMVs) are the RO’s most abundant 
component and the central hubs for viral 


RNA synthesis (5). The DMV’s interior accumu- 
lates double-stranded (ds) RNA, presumably 
intermediates of viral genome replication and 
subgenomic mRNA synthesis (4, 5). DMVs 
may offer a favorable microenvironment for 
viral RNA synthesis and may shield viral RNA 
from innate immune sensors that are acti- 
vated by dsRNA. However, coronaviral DMVs 
have been characterized as compartments that 
lack openings to the cytosol (4-6), despite the 
fact that newly-made viral mRNAs need to be 
exported for translation. Moreover, the coro- 
navirus genome needs to be packaged by the 
cytosolic nucleocapsid (N) protein before being 


targeted to virus assembly sites on secretory 
pathway membranes (8). 

In this study, we used cryo-electron micros- 
copy (cryo-EM) to analyze the structure of 
coronavirus-induced ROs in their native host 
cellular environment. The murine hepatitis 
coronavirus (MHV) is a well-studied model 
for the genus Betacoronavirus, which also 
includes severe acute respiratory syndrome 
coronavirus (SARS-CoV), Middle East respira- 
tory syndrome coronavirus (MERS-CoV), and 
SARS-CoV-2. One advantage of MHV over these 
class 3 agents is the absence of serious biosafety 
constraints, thus making MHV suitable for in 
situ cryo-EM studies. We performed electron 
tomography (ET) on cryo-lamellae prepared 
by focused ion beam milling of cells in the 
middle stage of MHV infection. The tomo- 
grams revealed abundant perinuclear DMVs 
with an average diameter of 257 + 63 nm (+SD), 
occasionally interconnected or connected to 
the ER as part of the reticulovesicular network 


'Section Electron Microscopy, Department of Cell and Chemical 
Biology, Leiden University Medical Center, Leiden 2333 ZC, 
Netherlands. Molecular Virology Laboratory, Department of 
Medical Microbiology, Leiden University Medical Center, Leiden 
2333 ZA, Netherlands. “Department of Structural Cell Biology 
of Viruses, Centre for Structural Systems Biology, Heinrich 
Pette Institute, Leibnitz Institute of Experimental Virology, 
22607 Hamburg, Germany. “Howard Hughes Medical Institute, 
Department of Biochemistry and Biophysics, University of 
California San Francisco, San Francisco, CA 94143, USA. 
“Department of Biochemistry and Biophysics, University of 
California San Francisco, San Francisco, CA 94143, USA. 
®Department of Chemistry, MIN Faculty, Universitat Hamburg, 
20146 Hamburg, Germany. 

*Corresponding author. Email: m.barcena@lumc.nl 


DMV outer membrane 
\ DMV inner membrane 
ER ™ERGIC 

DMV luminal filaments 
Wivirus particle envelope 
Wi spike protein 

IERNP Wiribosomes 
Hi molecular pore 


Fig. 1. Coronavirus-induced DMVs revealed by cryo-ET. (A) Tomographic slice (7 nm thick) of a cryo-lamella milled through an MHV-infected cell at a middle stage of 
infection. (B) Three-dimensional (3D) model of the tomogram, with the segmented content annotated. See also movie Sl. ERGIC, ER-to-Golgi intermediate compartment. 
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Fig. 2. Architecture of the molecular pores embedded in DMV membranes. Tomographic slices (7 nm thick) 
revealed that pore complexes were present in both (A) MHV-induced DMVs and (B) prefixed SARS-CoV-2- 
induced DMVs (white arrowheads). The inset in (A) is a close-up view of the area delineated by white brackets. 
(C to L) Sixfold-symmetrized subtomogram average of the pore complexes in MHV-induced DMVs. (C) 
Central slice through the average, suggesting the presence of flexible or variable masses near the prongs 
(black arrowhead) and on the DMV luminal side. (D to F) Different views of the 3D surface-rendered model of 
the pore complex (copper colored) embedded in the outer (yellow) and inner (blue) DMV membranes. 

(G to L) 2D cross-section slices along the pore complex at different heights (see also movie S2). (M and N) An 
additional density at the bottom of the sixfold-symmetrized volume (c6, green) appeared as an off-center 


asymmetric density in the unsymmetrized average (c 


described in previous work (Fig. 1, fig. S1, and 
movie Sl) (4-7). In addition, macromolecular 
features that had not been discerned in con- 
ventional EM samples became apparent (figs. 
S2 to S4). The DMV lumen appeared to pri- 
marily contain filamentous structures that 
likely correspond to viral RNA (Fig. 1 and fig. 
S4). In part, this is expected to be present as 
dsRNA (4, 5), as supported by the relatively 
long, straight stretches observed in some of 
these filaments, consistent with the persist- 
ence length of dsRNA (9) (fig. S4). 

Each DMV contained multiple copies of a 
molecular complex that spanned both mem- 
branes, connecting the DMV interior with the 
cytosol (Fig. 2A and supplementary text). Such 
complexes were also found in DMVs in pre- 
fixed SARS-CoV-2-infected cells (Fig. 2B and 
fig. S5). We surmise that this pore represents a 
generic coronaviral molecular complex that 
has a pivotal role in the viral replication cycle. 
Most likely, it allows the export of newly syn- 
thesized viral RNA from the DMVs to the cy- 
tosol. Functionally analogous viral complexes 
used for RNA export include those in the cap- 
sids of the Reoviridae (10) and, notably, the 
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molecular pore in the neck of the invaginated 
replication spherules induced by flock house 
virus (17). None of these complexes, however, 
are integrated in a double-membrane organelle. 

Subtomogram averaging of the double- 
membrane-spanning complexes in MHV-induced 
DMVs revealed an overall sixfold symmetry 
(Fig. 2, fig. S6, and movie $2). A cytosolic crown- 
like structure extended ~13 nm into the cytosol 
and was based on a ~24-nm-wide platform em- 
bedded in the DMV membranes. The two mem- 
branes did not fuse and maintained the typical 
DMV intermembrane spacing of ~4.5 nm (fig. 
$2). The complex formed a channel that fol- 
lowed its sixfold axis. On the DMV luminal 
side, the channel started with a ~6-nm-wide 
opening, narrowed toward the cytosol, and 
had two tight transition points (Fig. 2, J and L). 
The one at the level of the DMV outer mem- 
brane (Fig. 2J) was the most constricted, with 
an opening of ~2 to 3 nm, but would still allow 
the transition of RNA strands. Toward the cy- 
tosolic space, the complex opened into a 
crown-like structure, exposing six cytosolic 
“prongs.” With an achieved resolution of 3.1 nm, 
we roughly estimate that the complex has a 
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total molecular mass of 3 MDa, of which the 
crown represents ~1.2 MDa (fig. S6). 

We then considered the possible constitu- 
ents of this complex. Coronaviruses express 
two large replicase polyproteins that are pro- 
teolytically cleaved into 16 nonstructural pro- 
teins (nsps) (12). Three of these nsps—nsp3 
(222 kDa in MHV), nsp4 (56 kDa), and nsp6 
(33 kDa)—are transmembrane proteins and 
thus are potential components of the pore. 
These nsps contain two, four, and six trans- 
membrane domains (TMDs), respectively (13-15) 
(Fig. 3A), and engage in diverse homotypic 
and heterotypic interactions (16) thought to 
drive the formation of double-membrane 
ROs (17-19). On the basis of its size, the multi- 
domain MHV nsp3 subunit is an attractive 
putative constituent of the pore. MHV nsp3 
consists of a large cytosolic region of ~160 kDa, 
followed by two TMDs and a C-terminal cyto- 
solic domain of ~41 kDa (13). Whereas the TMDs 
and C-terminal domain are highly conserved, 
the domain composition and size of the N- 
terminal part of nsp3 is quite variable among 
coronaviruses (16, 20). Several nsp3 domains, 
including the conserved N-terminal ubiquitin- 
like domain 1 (UbI1; 12.6 kDa) that binds both 
single-stranded RNA (27) and the N protein 
(22, 23), may interact with viral RNA (J6). 

To investigate whether nsp3 is a component 
of the DMV molecular pore, we imaged cells 
infected with a well-characterized engineered 
MHV expressing nsp3 with an enhanced green 
fluorescent protein (EGFP) moiety fused to the 
Ubl1 domain [MHV-A2-GFP3 (24)] (fig. S7). 
This mutant lacks nsp2, which is dispensable 
for replication in cell culture (25). Subtomo- 
gram averaging of the pore complexes in these 
samples (Fig. 3B) revealed the presence of six 
additional densities on top of the prongs, each 
representing a mass compatible with that of 
EGFP (Fig. 3, C to E, and movie S3). These re- 
sults identified nsp3 as a major constituent of 
the complex and provided insight into its ori- 
entation, with the Ubl1 domain residing in the 
prongs. Six copies of nsp3 can be envisioned 
to constitute most of the cytosolic crown-like 
structure (~1.2 MDa). Other viral and/or host 
proteins and lipids are probably also part of 
the ~1.8-MDa intermembrane platform, with 
nsp4 and nsp6 being prominent candidates. 
Notably, different studies suggest that nsp3- 
nsp4 interactions drive membrane pairing 
and determine DMV biogenesis and mor- 
phology (17-19, 26). 

The molecular pores frequently appeared 
to interact with other macromolecules on both 
the cytosolic and DMV luminal sides (fig. S8). 
In the subtomogram averages, these appeared 
as largely blurred out densities (Fig. 2C), which 
suggests that the interactions are dynamic. A 
small region on the luminal side, however, had 
a relatively higher density and was resolved 
in the unsymmetrized average as a closely 
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associated and slightly off-center mass (Fig. 
2,M and N, and fig. S6D). We speculate that 
this mass may be part of the viral repli- 
cation machinery. The coronaviral replication- 
transcription complex (RTC) is thought to 
consist of a subset of relatively small (~10 to 
110 kDa) nsps, with the RNA-dependent RNA 
polymerase (nsp12) at its core (27-29). How- 
ever, some of these subunits may associate 
with the RTC only transiently, and the nsp 
stoichiometries of the complex are unknown. 


nsp3 
222 kDa 


wt nsp3 


The luminal partners of the pore complex, 
prominent as masses varying in shape and 
size, appeared to interact with the putative 
RNA content of the DMVs (fig. S8). 

The interaction partners of the cytosolic nsp3 
prong ranged from chain-like masses to larger 
assemblies (fig. S8, black arrowheads). The 
subdomains of the long N-terminal nsp3 do- 
main engage in a range of viral and virus-host 
interactions (16, 20); consequently, the list 
of possible interactors is substantial. Among 
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Fig. 3. The coronavirus transmembrane protein nsp3 is a component of the pore complex. (A) (Top) 
Membrane topology of MHV transmembrane nsps, with protease cleavage sites indicated by orange (PL1""°), 
red (PL2"), and gray (MP'°) arrowheads. (Bottom) Detailed depiction of nsp3, showing some of its 
subdomains and the position of the additional EGFP moiety present in MHV-A2-GFP3. PL?, papain-like 
protease; M°'°, main protease. (B) Tomographic slice of DMVs induced by MHV-A2-GFP3, with embedded 
pore complexes (white arrowheads). (C and D) Comparison of the central slices of the sixfold-symmetrized 
subtomogram averages of the pore complexes in DMVs induced by (C) wild-type (wt) MHV and (D) MHV-A2- 
GFP3. (E) Density differences of 3 standard deviations between the mutant and wild-type structures, shown 

as a green overlay over the latter, revealed the presence of additional (EGFP) masses in the mutant complex 


(black arrowheads; see also movie S3). 
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Fig. 4. Model of the coronavirus genomic RNA transit from the DMV lumen to virus budding sites. 
Tomographic slices from MHV-infected cells (top) highlight the respective steps in the model (bottom). 
(A) The molecular pore exports viral RNA into the cytosol, (B) where it can be encapsidated by 

N protein. (C) Cytosolic RNP complexes can then travel to virus assembly sites for membrane association 
and (D) subsequent budding of virions. The insets in the top panels provide close-up views of the areas 


delineated by white brackets. 
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them, the viral N protein (55 kDa), which binds 
to the nsp3 Ubl1 domain (22, 23), is a prominent 
candidate. The Ubl1-N interaction has been 
proposed to target viral RNA to replication 
sites at early stages of infection (23), but it may 
also modulate RNA exit and encapsidation 
on the cytosolic side of the pore complex. No- 
tably, DMV-rich regions of the cytosol were 
crowded with protein assemblies that had a 
diameter of ~15 nm (fig. S9). These proteins 
strongly resembled the nucleocapsid struc- 
ture in coronavirus particles, a helical ribo- 
nucleoprotein (RNP) complex that consists 
of the RNA genome and N protein oligomers 
(30) (fig. S9). 

Our findings suggest a pathway for newly 
made viral genomic RNA from the DMV in- 
terior, via the channel of the pore, to the cy- 
tosolic sites of encapsidation. In our model, 
specific replicase subunits may associate with 
the pore complex to guide the newly synthe- 
sized RNA toward it (Fig. 4A). As proposed for 
other +RNA viral ROs (7D), only +RNAs would 
need to be exported, whereas negative-stranded 
templates and/or dsRNA intermediates could 
remain inside the DMVs. On the cytosolic side, 
all exported viral mRNAs may associate with 
the N protein (Fig. 4B). Alternatively, the ac- 
cumulating N protein could serve to select part 
of the newly made genomes for packaging. 
The remainder would then be used for trans- 
lation, together with the much smaller, though 
much more abundant, subgenomic mRNAs 
(3D. Genome-containing RNP complexes would 
travel to the membranes where the viral en- 
velope proteins accumulate and engage in the 
assembly of progeny virions (Fig. 4C) (32). These 
bud into single-membrane compartments (Fig. 
4D), typically derived from the ER-to-Golgi in- 
termediate compartment (8), and travel along 
the secretory pathway to be released into extra- 
cellular space. 

The double-membrane-spanning molecular 
pore revealed here may constitute the exit 
pathway for coronaviral RNA products from 
the DMV’s interior toward the cytosol, with 
the large and multifunctional nsp3 being its 
central component. Although the exact mode 
of function of this molecular pore remains to 
be elucidated, it seems to be a key structure 
in the viral replication cycle that is likely con- 
served among coronaviruses and thus may 
offer a coronavirus-specific drug target. 
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Adjustable Tip Spacing Pipettes 
Eppendorf announces Move It Adjust- 
able Tip Spacing Pipettes, an efficient, 
safe solution for synchronous pipetting 
of a series of samples between differ- 
ent vessel formats, such as between 
tubes and plates. Their operating time 
compared to that of single-channel pipettes is thus significantly re- 
duced. Autoclavability is also enabled, which additionally increases 
user and sample safety. In practice, single-channel pipettes have 
been used to transfer individual samples from one vessel format 
to another—an inefficient, tedious, and error-prone method for 
routine work. Move It significantly accelerates and simplifies the 
workflow when frequent format changes are required. Further, 
manual adjustment of the tip distance with the adjustment knob 
allows the user to maintain a relaxed, ergonomic hand position 
even during format change. Move It is available with 4, 6, 8, and 12 
channels, and can be mechanical or electronic. 

Eppendorf 

For info: 800-645-3050 

www.eppendorf.com/move-it 


SARS-CoV-2 IgG ELISA Kit 

Our SARS-CoV-2 IgG ELISA Kit is an enzyme-linked immunosorbent 
assay (ELISA) designed for qualitative detection of immunoglobulin 
G (IgG) antibodies specific to SARS-CoV-2 in human serum samples 
issued under emergency use authorization as an aid in identifying 
individuals with an adaptive immune response to SARS-CoV-2. For- 
matted in a 96-well microplate (12 x 8 well strips), the assay is easily 
adapted for either automated open platforms or manual workflows. 
Designed for accurate and sensitive detection, the indirect ELISA has 
a two-step binding process involving a specific SARS-CoV-2 antigen 
and a horseradish peroxidase-conjugated antihuman IgG secondary 
antibody. 

Enzo Life Sciences 

For info: 800-942-0430 

www.enzolifesciences.com/enz-kit1 70/sars-cov-2-igg-elisa-kit 


Anti-Certolizumab Pegol Antibodies 

Bio-Rad Laboratories has launched a range of anti-certolizumab 
pegol inhibitory antibodies to support the development of assays 
for therapeutic drug monitoring for all five marketed tumor necrosis 
factor alpha (TNFa) inhibitor biologics and their biosimilars. The 
range comprises three antibodies that are highly specific for the 
monoclonal antibody antigen-binding fragment certolizumab pegol 
(Cimzia) and that inhibit the binding of this drug to its target, TNFa. 
The antibodies are fully human in full-length immunoglobulin G (IgG) 
format and can be used as a Surrogate positive control or calibration 
standard for an antidrug antibody (ADA) assay to measure levels 

of patient antidrug antibodies. TNFa inhibitors are used to treat a 
wide range of inflammatory conditions, such as rheumatoid arthritis, 
Crohn's disease, and psoriasis. The levels of serum drug and ADA 
concentrations are monitored in patients receiving TNFa antagonists 
to help guide clinical decision-making, optimize treatment, improve 
outcomes, and reduce health care costs. 

Bio-Rad 

For info: 800-424-6723 

www.bio-rad-antibodies.com 
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new products 


Skeletal Muscle Differentiation Kit 

AMS Biotechnology offers a Skeletal Muscle Differentiation Kit that 
enables you to differentiate human pluripotent stem cells to skeletal 
muscle myotubes with high yields, without cell sorting or genetic 
manipulation. Myotubes are contractile, express typical muscle 
markers, and show striated sarcomeres. Until recently, methods of 
studying muscular disease and potential therapies depended on 
invasive muscle biopsies to produce limited batches of primary cells. 
Use of primary cells presents challenges, not only in the collection 
process but also related to inconsistencies in cell growth, behavior, 
and life span, making it difficult to generate reliable experimental 
models. Our revolutionary Skeletal Muscle Differentiation Kit allows 
researchers to generate muscle from human pluripotent stem cells 
in three easy steps, via satellite-like or progenitor cells and myoblasts 
that then fuse to multinucleated myotubes in the third step. The kit 
protocol generates a highly pure population of approximately 70% 
skeletal muscle myotubes in a reproducible fashion. 

AMS Biotechnology 

For info: 617-945-5033 
www.amsbio.com/skeletal-muscle-differentiation-kits.aspx 


Inflammation Panel Kit 

The Human QBeads Inflammation Panel Kit allows the measure- 
ment of seven human cytokines and chemokines typically associated 
with inflammatory responses to disease states, such as autoim- 
mune diseases, chronic inflammation, and infections—including 
viral infections such as COVID-19. These custom-built, ready-to-run 
kits provide solution to monitor key biomarkers involved in cytokine 
release syndrome. Analytes offered in the Human QBeads Inflamma- 
tion Panel Kit include human interferon gamma, Interleukin-2, 
Interleukin-6, CCL2 (MCP-1), CCL3 (MIP-1a), CXCL9 (MIG), and CXCL10 
(IP-10). Cytokine storm syndrome, also known as cytokine release 
syndrome, is an inflammatory response commonly caused by viral 
infections. It is characterized by excessive or uncontrolled release of 
proinflammatory cytokines. Respiratory virus infections can induce 
abnormal cytokine production in the host. To better understand 
host defense mechanisms against viruses, it is important to monitor 
cytokine production and signaling pathways during viral infection. 
Intellicyt 

For info: 734-769-1600 

intellicyt.com/inflammation-panel-kit 


Liquid Nitrogen-Based Automated Storage System 

Brooks Life Sciences offers BioStore Illv, an automated, next- 
generation alternative to manual -80°C mechanical freezers. 
BioStore Illv operates like a vending machine; after a secure login 
has been entered on the touchscreen, the patented design automati- 
cally locates and lifts the storage racks and will eject the targeted 
material upon the user, indicating they are ready to receive it. This 
design protects all materials by providing a stable temperature 
during inventory interactions. For comparison, a mechanical freezer 
chamber can warm by as much as 40° during a routine door open- 
ing, with additional time required for the system to recool when the 
door is closed. A single BioStore Illv freezer holds up to 63,000 2.0- 
mL vials. In the event of a natural or man-made disaster, BioStore Illv 
has built-in protection—maintaining safe temperatures for upwards 
of 4 days—an improvement of 16X over -80°C mechanical freezers. 
Brooks Life Sciences 

For info: 800-379-7221 

www.brooks.com 
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VANDERBILT 7 UNIVERSITY 


MEDICAL CENTER 


FACULTY POSITION IN COMPUTATIONAL 
MICROBIOLOGY AND IMMUNOLOGY 


VANDERBILT UNIVERSITY MEDICAL CENTER 
Department of Pathology, Microbiology, and Immunology 
Department of Medicine 
Program in Computational Microbiology and Immunology 
Vanderbilt Center for Immunobiology 
Vanderbilt Institute for Infection, Immunology 
and Inflammation 


Vanderbilt University Medical Center, through an institution- 
wide initiative, invites applications for faculty positions in 
Computational Microbiology and Immunology at all levels. 
Successful candidates will be expected to establish and maintain 
an independent research program focusing on the development 
and application of computational methods in the areas of 
microbiology and/or immunology, and to participate in teaching 
of graduate and medical students. Areas of particular interest 
include, but are not limited to, host-pathogen interactions, 
cancer immunology, microbiome research, evolutionary 
microbiology, and systems immunology. Candidates should 
have substantial post-graduate training highlighted by peer- 
reviewed publications that demonstrate research productivity. 


Vanderbilt University Medical Center, located on the Vanderbilt 


University campus, is home to internationally recognized 


programs in bioinformatics, drug discovery, global health, 


inflammation, imaging science, pharmacology, proteomics, 
and vaccine science. The School of Medicine consistently ranks 
in the Top 20 US Medical Schools and provides outstanding 
opportunities for scholarship, collaboration, and teaching. 


The Vanderbilt University campus is located in the heart of 
Nashville, the capital of Tennessee, known internationally as 
“Music City, USA”. Nashville is also the home to professional 
sports teams, the Nashville Symphony, the Frist Center for the 
Visual Arts, and numerous activities for outdoor enthusiasts. 


Nashville is a wonderful place to live, work, and raise a family. 


Applicants should send a curriculum vitae, a statement of current 
and future research interests, and at least 3 letters of reference 
to: Ivelin Georgiev, Ph.D., Director, Program in Computational 
Microbiology and Immunology, Vanderbilt University Medical 
Center at vi4d.cmi@vume.org. Review of applications will 
commence immediately, but all applications received by 
October 31, 2020 will receive full consideration. 


Vanderbilt University Medical Center is an Affirmative Action/ 
Equal Opportunity Employer. Women and minority candidates 
are encouraged to apply. 


Yale University 
School of Medicine 


FACULTY POSITION AT THE ASSISTANT 
PROFESSOR LEVEL 


DEPARTMENT OF CELLULAR AND 
MOLECULAR PHYSIOLOGY 


The Department of Cellular and Molecular Physiology is conducting 
a search for new faculty members at the assistant professor level. 


The search seeks candidates whose research connects the properties 
of molecules to the properties of physiological systems. 


Excellent opportunities are available for collaborative research, 
as well as for graduate and medical student teaching. Candidates 
must hold a Ph.D., M.D., or equivalent degree. Applicants should 
include a cover letter, curriculum vitae, a statement that describes 
past research accomplishments and future goals, and should arrange 
to have three letters of reference sent. Applicants should apply at the 
following website: http://apply.interfolio.com/78493 


Application Deadline: November 2, 2020 
Yale University is an Affirmative Action/Equal Opportunity 


Employer and welcomes applications from women, persons with 
disabilities, protected veterans, and members of minority groups. 


SAINT LOUIS UNIVERSITY 


POSTDOCTORAL POSITIONS 
DEPARTMENT OF BIOCHEMISTRY 
AND MOLECULAR BIOLOGY 
SAINT LOUIS UNIVERSITY SCHOOL OF MEDICINE 


Saint Louis University, a Catholic Jesuit institution dedicated 
to education, research, health care, and service, is seeking 
outstanding applicants for postdoctoral positions to study the 
structure and enzymology of several coagulation factors in the 
laboratory of Dr. Enrico Di Cera in the Edward A. Doisy 
Department of Biochemistry and Molecular Biology (https://bio 
chem.slu.edu/faculty/dicerawp). Experience in rapid kinetics, 
smFRET, X-ray crystallography, NMR, or cryoEM is required. 
Please submit CV to enrico@slu.edu and a cover letter, 
curriculum vitae, application, and addresses of three references to: 
https://slu.wd5.myworkdayjobs.com/en-US/Careers/job/SLU- 
Saint-Louis-MO/Post-Doctoral-Fellow_2020-00705. 


Saint Louis University is an Affirmative Action, 
Equal Opportunity Employer, and encourages nominations 
and applications of women and minorities. 
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The Paris Region 
Fellowship Program 


Two calls for a total of 52 post-doctoral 
positions in a large panel of disciplines for 
international researchers of all nationalities, 


Professional training programme and career 
development opportunities for recruited 
researchers, 


Call deadline October, 21st, 2020 


Further information : 
parisregion.eu/parisregionfp.html 


Application Website : 
https://parisregionfp.sciencescall.org/ 


This project has received funding from the 
European Union’s Horizon 2020 research 
and innovation programme under the Marie 
Sktodowska-Curie actions Grant Agreement 
n° 945298-ParisRegionFP. 


Région 


Fellowships for Postdoctoral Scholars 
at Woods Hole Oceanographic Institution 


New or recent doctoral recipients are encouraged to 
submit applications prior to October 15, 2020. 


Awards related to the following areas are anticipated: 
Applied Ocean Physics & Engineering; Biology; Geology & 
Geophysics; Marine Chemistry & Geochemistry; Physical 
Oceanography; The Center for Marine and Environmental 
Radioactivity; The National Ocean Sciences Accelerator Mass 
Spectrometry Facility; The Ocean Bottom Seismic Instrument 
Center; The Ocean Twilight Zone Project; and a joint USGS/ 
WHOI award. Interdepartmental research is also encouraged. 


Awards are competitive, with primary emphasis on research 
promise. Scholarships are 18-months with an annual 
stipend of $62,250, a health and welfare allowance and 

a research budget. Recipients are encouraged to pursue 
their own research interest in association with resident sta 
Communication with potential WHO! advisors prio 

to submitting an application is encouraged. — 

Recipients of awards can begin any time after 

January 1 and before December 1, 2021. 
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VIRGINIA 
TECH. 


Faculty Position in 
Quantitative 
Neuroscience 


We seek a motivated individual to be part of a next generation 
effort in human neuroscience organized around novel 
approaches in neuroimaging, neuro-modeling, and human 
neuroscience. We are particularly interested in applicants 
with an interest in applying their approaches or developing 
new ones in the area of Computational Psychiatry. In this 
area, we seek applications for a tenure-track Assistant 
Professor. While there are no restrictions on the training 
trajectory of the successful applicant, it is paramount that 
they demonstrate through their publication history a capacity 
for transdisciplinary work. The successful applicant will be 
expected to interact with the existing group of investigators 
in the Center for Human Neuroscience Research and 
the Computational Psychiatry Unit (http://labs.vte. 
vt.edu/cpu/) at the Fralin Biomedical Research Institute 
at Virginia Tech Carilion (https://research.vtc.vt.edu). 
The institute has three state-of-the-art 3T MRI machines 
dedicated to research, a state-of-the-art optically pumped 
magnetometry suite, laboratory facilities focused on in vivo 
electrochemical measurement of neurotransmitters, a large 
server farm dedicated to computational neuroscience and a 
focus on decision-making that underlies health behaviors 
and psychopathology. 


Applications will be reviewed continuously until the 
position is filled. It is expected that initial interviews will 
begin in September of 2020. To apply, please submit your 
application including curriculum vitae, detailed statement of 
research accomplishments and plans and teaching/mentoring 
philosophy at www.jobs.vt.edu, posting # O1101F. Also, 
have at least three references post their letters of support to 
the same site. 


Virginia Tech recognizes the critical importance of diverse 
teams of scholars. It seeks to diversify its faculty along 
multiple dimensions. Virginia Tech is a public global 
land-grant university, committed to research, teaching and 
learning, and outreach to the Commonwealth of Virginia, 
the nation, and the world. Building on its motto of Ut Prosim 
(that I may serve), Virginia Tech is dedicated to Inclusive VT- 
https://www.inclusive.vt.edu/ serving in the spirit of 
community, diversity, and excellence. We seek candidates 
who adopt and practice the Principles of Community 
(https://www.inclusive.vt.edu/Programs/vtpoc0.html), 
which are fundamental to our on-going efforts to increase 
access and inclusion and to create a community that nurtures 
learning and growth for all of its members. 


Inquiries about the position should be directed to the chair 
of the search committee, P. Read Montague (read@vtc. 
vt.edu) or the director of the institute, Michael J. Friedlander 
(friedlan@vtc.vt.edu). 


Virginia Tech is an Equal Opportunity Employer. For 
inquiries regarding non-discrimination policies, contact the 
Office of Equity and Access at 540-231-2010 or Virginia 
Tech, North End Center, Suite 2300 (0318), 300 Turner St. 
NW, Blacksburg, VA 24061. 
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Data-driven advice for grad school 


o you have any advice for future graduate students?” I asked. The student had recently defended 

his Ph.D., and I was conducting an exit interview—something I do with every graduating bio- 

medical Ph.D. student at my university, where I am in charge of evaluating our medical school’s 

Ph.D. training programs. He sat back in his chair and thought for a minute before responding: 

He wished he had started to plan for his post-Ph.D. career earlier. My shoulders dropped and 

I let out a sigh. “Program directors recommend this to incoming students every year, but some 
don’t seem to hear it,” I said. “How do you think we can get them to listen?” This time, he didn’t hesitate. 
“They are graduate students in science,” he exclaimed. “Show them the data!” 


That was my aha moment. I im- 
mediately began to document the 
responses to this question in subse- 
quent interviews. It has been 3 years 
now, and the data I’ve collected 
confirm my suspicions—the same 
answers come up again and again. 
As a new cohort of Ph.D. students 
starts grad school this fall, here are 
the five pieces of advice graduates 
offer most frequently. 


CHOOSE YOUR MENTOR CAREFULLY 
Thirty-two percent of graduating 
students said this is the most criti- 
cal decision a Ph.D. student can 
make. Many students gravitate to- 
ward mentors who work in areas 
they find interesting and exciting, 
but it is also important to think 
about what style of mentoring you 
respond to best. Finding a mentor 
with the right mentoring approach for you is at least as 
important as finding one who studies a specific topic. 


START PLANNING YOUR FUTURE CAREER EARLY 

You need time to (a) decide which career paths you find ap- 
pealing and (b) start preparing for those careers. Twenty per- 
cent of graduating students recommended exploring future 
careers as early as possible so you can use your time in grad 
school to build additional skills you will need. To learn about 
specific professions, you can conduct informational inter- 
views, attend seminars where alumni discuss their careers, 
do an internship, or engage in a variety of other options. 


PAY ATTENTION TO YOUR MENTAL HEALTH 

Graduate school is full of ups and downs. Thirteen percent of 
graduates said that if you feel the need to talk to someone on 
or off campus, don’t hesitate. “If you are not happy, try to do 
something about it and make a change,” one student said. 


“Even when you're just 
getting started, you need 
to look forward.” 


If you feel isolated, another stu- 
dent recommended joining a cam- 
pus group to connect with others. 


MAP OUT YOUR GOALS 

Twelve percent of graduates recom- 
mended that students consistently 
and critically evaluate their progress 
throughout their training. Make an 
outline of your research and career 
goals and when you want to achieve 
them, and hold yourself to that plan. 
Some students use an individual 
development plan to prompt discus- 
sions with their mentor and thesis 
committee. But don’t wait for these 
meetings; setting goals and holding 
yourself accountable should be a 
continuous habit. 


FIND WORK-LIFE BALANCE 
This looks different for different peo- 
ple, but don’t ignore it. You should expect to work hard in grad 
school, but the right work-life balance can have an important 
influence on your mental health and overall quality of life. 
Nine percent of graduates recommended finding something 
that helps you unwind, such as pursuing hobbies, getting 
together with friends, or volunteering in the community. 
Observant readers may notice that the numbers above 
only add up to 86%. Other pieces of advice included be 
assertive and ask for what you need, learn to trust your 
experimental results as long as the controls work, and plan 
your projects around what’s needed for a publishable paper. 
But the most important thing is to take these pointers to 
heart early on. Even when youre just getting started, you 
need to look forward. 


Abigail M. Brown is the director of outcomes research for biomedical 
Ph.D. programs at Vanderbilt University School of Medicine in Nashville, 
Tennessee. Send your career story to SciCareerEditor@aaas.org. 
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